Dataset
nyuuzyou/paintberri

Dataset Card for PaintBerri

Dataset Summary

This dataset contains hand-drawn artwork collected from PaintBerri. The dataset includes images along with associated metadata such as publication dates, titles, descriptions, and dimensions.

Languages

The dataset is monolingual:

  • English (en): All image descriptions and metadata are primarily in English

Dataset Structure

Data Files

The dataset consists of:

  • Image files (stored across multiple ZIP files in the "images" directory, with approximately 1,000 images in each archive)
  • Corresponding metadata in JSONL format containing information about the images, including URLs, titles, dimensions, and publication dates
  • An archive index CSV file mapping image IDs to their respective archive files

Data Fields

  • image: URL of the full-sized image
  • published: Timestamp when the image was first published (UTC)
  • modified: Timestamp when the image was last modified (UTC)
  • description: Description of the image provided by the creator (may be empty)
  • title: Title of the image
  • is_nsfw: Boolean flag indicating whether the image contains not-safe-for-work content
  • thumbnail: URL of the thumbnail version of the image
  • height: Height of the image in pixels
  • width: Width of the image in pixels
  • creator_id: ID of the image creator (may be null)
  • short_id: Short unique identifier for the image

Data Splits

All images and metadata are in a single split with 68,860 entries.

Additional Information

Dataset Structure Supplemental

For easier access to individual images, an archive index file is provided:

  • archive_index.csv.zst: Maps each image file to its containing ZIP archive (format: "archive_name.zip","image_file_id.jpg")