Overview - Comics Understanding
Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the field faces persistent challenges such as:
- small datasets
- inconsistent annotations,
- inaccessible model weights,
- not directly comparable results due to varying train/test splits and metrics
To address these issues, we aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings. The Comics Dataset Framework [1] provides standardized dataset detection annotations and conversion scripts for existing dataset images. Moreover, in a recent CoMix dataset [2], multi-task annotations have been added to the existing Manga and Comics dataset, extending the set of comic styles to a balanced combination of both styles (see Figure 1).
Moreover, with the recent advancements in Vision and Language models [3,4], and in applications tailored to comics [5], current evaluation metrics and datasets in comics often lag behind model advancement, confined to small or single-style sets. The introduced CoMix benchmark is designed to assess the multi-task capabilities of comic analysis models, providing reading order annotations, character naming, and dialog generation, and proposing a new metric to evaluate models on these new benchmarks. The specifics of the multi-tasks CoMix benchmark are provided in Figure 2.
The validation split is provided together with annotations. The held-out Test set is available through the server Task.
References:
[1]: Vivoli et al., "Comics Datasets Framework: Mix of Comics Datasets for benchmarking", 2024, arxiv
[2]: Vivoli et al., "CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding", 2024, arxiv
[3]: OpenAI, "GPT-4 Technical Report", 2023, arxiv
[4]: OpenBNB, "MiniCPM-V 2.5", 2024, blog
[5]: Sachdeva et al., "The Manga Whisperer: Automatically Generating Transcriptions for Comics", 2024, CVPR24
Challenge News
Important Dates
Competition dates will be released in November 2024