Dataset
Dataset Card for PaintBerri
Dataset Summary
This dataset contains hand-drawn artwork collected from PaintBerri. The dataset includes images along with associated metadata such as publication dates, titles, descriptions, and dimensions.
Languages
The dataset is monolingual:
- English (en): All image descriptions and metadata are primarily in English
Dataset Structure
Data Files
The dataset consists of:
- Image files (stored across multiple ZIP files in the "images" directory, with approximately 1,000 images in each archive)
- Corresponding metadata in JSONL format containing information about the images, including URLs, titles, dimensions, and publication dates
- An archive index CSV file mapping image IDs to their respective archive files
Data Fields
image
: URL of the full-sized imagepublished
: Timestamp when the image was first published (UTC)modified
: Timestamp when the image was last modified (UTC)description
: Description of the image provided by the creator (may be empty)title
: Title of the imageis_nsfw
: Boolean flag indicating whether the image contains not-safe-for-work contentthumbnail
: URL of the thumbnail version of the imageheight
: Height of the image in pixelswidth
: Width of the image in pixelscreator_id
: ID of the image creator (may be null)short_id
: Short unique identifier for the image
Data Splits
All images and metadata are in a single split with 68,860 entries.
Additional Information
Dataset Structure Supplemental
For easier access to individual images, an archive index file is provided:
archive_index.csv.zst
: Maps each image file to its containing ZIP archive (format: "archive_name.zip","image_file_id.jpg")