Overview - Comics Understanding
Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the field faces persistent challenges such as:
- small datasets
- inconsistent annotations,
- inaccessible model weights,
- not directly comparable results due to varying train/test splits and metrics
To address these issues, we aim to standardize annotations across datasets, introduce a variety of comic styles into the datasets, and establish benchmark results with clear, replicable settings. The Comics Dataset Framework [1] provides standardized dataset detection annotations and conversion scripts for existing dataset images. Moreover, in a recent CoMix dataset [2], multi-task annotations have been added to the existing Manga and Comics dataset, extending the set of comic styles to a balanced combination of both styles (see Figure 1, left). In the GitHub framework CoMix, is present both the code for [1] and [2].
These works lead to the definition of various tasks, collected under the NEURIPS 2024 CoMix Dataset and Framework, and the ICDAR 2025 COMICS challenge (Challenge on Comics Understanding).
![]() |
![]() |
Figure 1. Composition of the CoMix benchmark. Qualitative representation of the datasets (left-top) and differences between the original annotations and those extended in CoMix (left-bottom). An illustration of the annotation is also provided (right).
Moreover, with the recent advancements in Vision and Language models [3,4], and in applications tailored to comics [5, 6], current evaluation metrics and datasets in comics often lag behind model advancement, confined to small or single-style sets. The introduced CoMix benchmark is designed to assess the multi-task capabilities of comic analysis models, providing reading order annotations, character naming, and dialog generation, and proposing a new metric to evaluate models on these new benchmarks. The specifics of the multi-task CoMix benchmark are provided on the right side of Figure 1.
Starting from these progresses and from a recent survey on comics [7] which identifies the gap between the Vision-Language world and comics analysis, we have designed a set of sequences processing tasks. These are the tasks included in the ICDAR 2025 competition, based on the ``pick a panel`` format: given the context and a set of panels among which to choose, the model needs to identify the correct choice. The tasks are framed as classification tasks, all in the format of multi-panel input (with text) and the correct option index as the answer.
Task 1: Pick-a-Panel (ICDAR 2025)
The challenge comprises one task named ``pick a panel``, with which we frame three different skills that we briefly describe here. For more detailed information, refer to the Tasks section:
- Skill 1 - Sequence Filling: This task is presented as choosing the right panel among a set of options. The context corresponds to a sequence of panels, with one missing panel in a specified location. The length of the context sequence could be from 3 to 7, and the missing item index from -1 to len(context), thus the missing could also be the previous panel (-1) or the following one (len(context)).
- Skill 2 - Closure: This is the well-known closure task initially proposed in comics [8], but with the text-closure task reframed for (i) panel options input and (ii) text options input. Also in this case, as proposed in [9], the context length may vary as well as the possible options.
- Skill 3 - Caption relevance: This task inherits from the previous tasks but adds a line of complexity: the context is given uniquely from a detailed description of the previous panel,
Task 2: Multi-task Single-page (CoMix Benchmark)
For this task, please refer to the NEURIPS 2024 CoMix Benchmark, (hosted at https://github.com/emanuelevivoli/CoMix), where the instructions for gathering the data, the model weights, and validation split (with annotations) are provided. The task will be progressively hosted here. The held-out test set will be available through the server task soon. TBD: more to come in the incoming months!
References:
- Vivoli et al., "Comics Datasets Framework: Mix of Comics Datasets for benchmarking", 2024, ICDAR 2024
- Vivoli et al., "CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding", 2024, NeurIPS 2024
- OpenAI, "GPT-4 Technical Report", 2023, arxiv
- OpenBNB, "MiniCPM-V 2.5", 2024, blog
- Sachdeva et al., "The Manga Whisperer: Automatically Generating Transcriptions for Comics", 2024, CVPR 2024
- Sachdeva et al., "Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names", 2024, ACCV 2024
- Vivoli et al., "One missing piece in Vision and Language: A Survey on Comics Understanding", 2024, under revision
- Iyyer et al, "The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives", 2016, CVPR 2017
- Vivoli et al. "Multimodal Transformer for Comics Text-Cloze", 2024, ICDAR 2024
Challenge News
Important Dates
ICDAR 2025 Edition
17-21/09/2025: Results presentation
30/04/2025: Camera-ready of competition report
20/04/2025: Initial submission of competition report
15/04/2025: Deadline for Competition submissions
25/02/2025: Benchmark model and Dev sets v0.1
19/02/2025: Test/Val sets v0.1 available
10/01/2025: Tasks have been updated for the ICDAR 2025 COMICS competition at https://www.icdar2025.com/program/competitions
CoMix Benchmark
November 2024: The dataset & evaluation repository CoMix https://github.com/emanuelevivoli/CoMix have been released