Overview - Comics Understanding
Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the field faces persistent challenges such as:
- small datasets
- inconsistent annotations,
- inaccessible model weights,
- not directly comparable results due to varying train/test splits and metrics
To address these issues, we aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings. The Comics Dataset Framework [1] provides standardized dataset detection annotations and conversion scripts for existing dataset images. Moreover, in a recent CoMix dataset [2], multi-task annotations have been added to the existing Manga and Comics dataset, extending the set of comic styles to a balanced combination of both styles (see Figure 1).
Moreover, with the recent advancements in Vision and Language models [3,4], and in applications tailored to comics [5], current evaluation metrics and datasets in comics often lag behind model advancement, confined to small or single-style sets. The introduced CoMix benchmark is designed to assess the multi-task capabilities of comic analysis models, providing reading order annotations, character naming, and dialog generation, and proposing a new metric to evaluate models on these new benchmarks. The specifics of the multi-tasks CoMix benchmark are provided in Figure 2.
The validation split is provided together with annotations. The held-out Test set is available through the server Task.
[1]: Comics Datasets Framework: Mix of Comics Datasets for benchmarking, 2024, link?
[2]: CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding, 2024, link?
[3]: GPT-4 Technical Report, 2023, arxiv
[4]: MiniCPM-V 2.5, 2024, blog
[5]: The Manga Whisperer: Automatically Generating Transcriptions for Comics, 2024, CVPR24