Dataset
Dataset Card for PaperDemon
Dataset Summary
This dataset contains artwork collected from PaperDemon. The dataset includes artwork images along with associated metadata such as titles, posting dates, descriptions, tags, and user comments.
Languages
The dataset is primarily monolingual:
- English (en): Most image descriptions and metadata are in English, though some artists may include multilingual content in their descriptions or comments
Dataset Structure
Data Files
The dataset consists of:
- Artwork image files (stored across multiple ZIP archives, with approximately 1,000 images in each archive)
- Corresponding metadata in JSONL format containing information about the artworks, including URLs, titles, descriptions, tags, and user comments
- An archive index CSV file mapping image filenames to their respective archive files
Data Fields
id
: Unique identifier for the artworktitle
: Title of the artworkposted_date_utc
: Date and time when the artwork was posted (UTC format)image_url
: URL to the artwork imagedescription
: Description of the artwork provided by the artisttags
: Array of tags associated with the artwork, each containing:text
: Tag nametype
: Type of tag (e.g., "community", "game")
characters
: Array of characters featured in the artwork (may be empty)comments
: Array of user comments, each containing:timestamp_utc
: Date and time when the comment was posted (UTC format)text
: Content of the comment
Data Splits
All artworks and metadata are in a single split with 45,970 entries.