- Task 1 - Detection
- Task 2 - Detection-Linking
- Task 3 - Detection-Recognition
- Task 4 - Detection-Recognition-Linking
method: MapText Strong Pipeline2025-03-31
Authors: Yu Xie, Canhui Xu, Jielei Zhang, Pengyu Chen, Weihang Wang, Yuchen He, Peiyi Li, Yihan Meng, Longwen Gao
Affiliation: Bilibili Inc., QUST
Description: For the English MapText recognition task, we employed DNTextSpotter, a novel denoising training method based on DeepSolo. For the Chinese MapText recognition task, we utilized DeepSolo. Data augmentation techniques, including cropping, scaling, and adjustments to saturation and contrast, were applied. Pre-training was conducted using available real-world datasets such as TextOCR, TotalText, IC15, and MLT2017. Post-processing methods were also adopted.
@inproceedings{ye2023deepsolo, title={Deepsolo: Let transformer decoder with explicit points solo for text spotting}, author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19348--19357}, year={2023} }
method: Self-Sequencer2025-03-28
Authors: Mengjie Zou, Tianhao Dai, Remi Petitpierre, Beatrice Vaienti, Frederic Kaplan, Isabella di Lenardo
Affiliation: EPFL, Swiss Federal Institute of Technology in Lausanne
Email: remi.petitpierre@epfl.ch
Description: For word detection and recognition, our approach relies on DeepSolo, whose architecture is derived from Detection Transformers (DETR). In short, DeepSolo extracts hierarchical visual features from map images and processes them through an encoder-decoder architecture to detect words as segments bounded by Bézier curves. The model specifically returns four control points of central Bézier curves per word and then uniformly samples query points along these curves to segment, classify, and delineate each text instance precisely. To resolve duplicate word detections, we implement a postprocessing step inspired by Non-Maximum Suppression. It involves calculating the Fréchet distance between the Bézier curves of potential duplicate word pairs, or "directional synonyms", and merging those below a defined threshold. More details on the model, algorithms, and specific implementation are provided in our separate article [1].
The model training leverages several real and synthetical datasets: ICDAR MapText [2], MapKuratorHuman [3], SynthMap [3], and Paris and Jerusalem Maps Text Dataset [4].
References:
[1] Zou, M., Dai, T., Petitpierre, R., Vaienti, B., Kaplan, F., & di Lenardo I. (2025). Recognizing and Sequencing Multi-word Texts in Maps Using an Attentive Pointer.
[2] Lin, Y., Li, Z., Chiang Y.Y., & Weinman J. (2024). Rumsey Train and Validation Data for ICDAR'24 MapText Competition (Version 1.3). Zenodo. https://doi.org/10.5281/zenodo.11516933
[3] Kim, J., Li, Z., Lin Y., Namgung, M., Jang, L., & Chiang Y.Y. (2023) The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps. In: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems. https://arxiv.org/abs/2306.17059
[4] Dai, T., Johnson, K., Petitpierre, R., Vaienti, B., & di Lenardo, I. (2025). Paris and Jerusalem Maps Text Dataset (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14982662
method: Baseline TESTR Finetuned2025-04-29
Authors: Organizers
Affiliation: ICDAR’25 RRC-MapText
Description: TESTR checkpoint (polygon prediction head, TotalText tuned) is further finetuned on the competition training data available for each data set.
Overall | Words | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | H-Mean | Precision | Recall | Tightness | Char Accuracy | Char Quality | Det Quality | F-Score | |||
2025-03-31 | MapText Strong Pipeline | 91.13% | 95.88% | 91.84% | 83.75% | 94.04% | 73.89% | 78.57% | 93.82% | |||
2025-03-28 | Self-Sequencer | 90.30% | 91.52% | 89.13% | 86.14% | 94.86% | 73.79% | 77.79% | 90.31% | |||
2025-04-29 | Baseline TESTR Finetuned | 89.53% | 89.14% | 90.04% | 86.28% | 92.92% | 71.82% | 77.30% | 89.59% | |||
2025-01-10 | [Baseline MapText '24] MapText Detection and Recognition Strong Pipeline | 89.26% | 96.16% | 85.01% | 83.27% | 93.97% | 70.61% | 75.14% | 90.24% | |||
2025-01-10 | [Baseline MapText '24] MapTest | 87.37% | 90.47% | 88.23% | 81.82% | 89.51% | 65.42% | 73.09% | 89.34% | |||
2025-04-19 | CREPE + BezierCurve | 84.93% | 87.10% | 86.53% | 73.62% | 95.47% | 61.02% | 63.91% | 86.81% | |||
2025-01-10 | [Baseline MapText'24] MapTextSpotter | 84.52% | 92.61% | 81.51% | 81.44% | 83.46% | 58.94% | 70.62% | 86.71% | |||
2025-01-10 | [Baseline MapText'24] DS-LP | 77.55% | 71.76% | 78.93% | 71.63% | 90.83% | 48.90% | 53.84% | 75.17% | |||
2025-01-10 | [Baseline MapText'24] TESTR Checkpoint | 74.61% | 71.85% | 66.90% | 79.55% | 82.10% | 45.26% | 55.12% | 69.29% | |||
2025-04-20 | Word-Level Text Detection and Recognition on Historical Maps Using Preprocessing and PaddleOCR | 59.46% | 59.54% | 40.01% | 73.89% | 83.71% | 29.60% | 35.36% | 47.86% | |||
2025-04-19 | YOLOv8_ViTAE_PolygonDetector | 27.42% | 61.75% | 9.64% | 75.97% | 77.96% | 9.88% | 12.67% | 16.68% |