Results - ICDAR 2025 Competition on Historical Map Text Detection, Recognition, and Linking

Authors: Yu Xie, Canhui Xu, Jielei Zhang, Pengyu Chen, Weihang Wang, Yuchen He, Peiyi Li, Yihan Meng, Longwen Gao

Affiliation: Bilibili Inc., QUST

Description: For the English MapText recognition task, we employed DNTextSpotter, a novel denoising training method based on DeepSolo. For the Chinese MapText recognition task, we utilized DeepSolo. Data augmentation techniques, including cropping, scaling, and adjustments to saturation and contrast, were applied. Pre-training was conducted using available real-world datasets such as TextOCR, TotalText, IC15, and MLT2017. Post-processing methods were also adopted.

@article{xie2024dntextspotter, title={DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training}, author={Xie, Yu and Qiao, Qian and Gao, Jun and Wu, Tianxiang and Fan, Jiaqing and Zhang, Yue and Zhang, Jielei and Sun, Huyang}, journal={arXiv preprint arXiv:2408.00355}, year={2024} }

@inproceedings{ye2023deepsolo, title={Deepsolo: Let transformer decoder with explicit points solo for text spotting}, author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19348--19357}, year={2023} }

Source code

method: Self-Sequencer2025-03-28

Authors: Mengjie Zou, Tianhao Dai, Remi Petitpierre, Beatrice Vaienti, Frederic Kaplan, Isabella di Lenardo

Affiliation: EPFL, Swiss Federal Institute of Technology in Lausanne

Email: remi.petitpierre@epfl.ch

Description: For word detection and recognition, our approach relies on DeepSolo, whose architecture is derived from Detection Transformers (DETR). In short, DeepSolo extracts hierarchical visual features from map images and processes them through an encoder-decoder architecture to detect words as segments bounded by Bézier curves. The model specifically returns four control points of central Bézier curves per word and then uniformly samples query points along these curves to segment, classify, and delineate each text instance precisely. To resolve duplicate word detections, we implement a postprocessing step inspired by Non-Maximum Suppression. It involves calculating the Fréchet distance between the Bézier curves of potential duplicate word pairs, or "directional synonyms", and merging those below a defined threshold. More details on the model, algorithms, and specific implementation are provided in our separate article [1].

The model training leverages several real and synthetical datasets: ICDAR MapText [2], MapKuratorHuman [3], SynthMap [3], and Paris and Jerusalem Maps Text Dataset [4].

References:

[1] Zou, M., Dai, T., Petitpierre, R., Vaienti, B., Kaplan, F., & di Lenardo I. (2025). Recognizing and Sequencing Multi-word Texts in Maps Using an Attentive Pointer.

[2] Lin, Y., Li, Z., Chiang Y.Y., & Weinman J. (2024). Rumsey Train and Validation Data for ICDAR'24 MapText Competition (Version 1.3). Zenodo. https://doi.org/10.5281/zenodo.11516933

[3] Kim, J., Li, Z., Lin Y., Namgung, M., Jang, L., & Chiang Y.Y. (2023) The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps. In: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems. https://arxiv.org/abs/2306.17059

[4] Dai, T., Johnson, K., Petitpierre, R., Vaienti, B., & di Lenardo, I. (2025). Paris and Jerusalem Maps Text Dataset (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14982662

Zou M., Dai T., Petitpierre R., Vaienti B., Kaplan F., di Lenardo I. (2025). Recognizing and Sequencing Multi-word Texts in Maps Using an Attentive Pointer. In Review. doi: 10.21203/rs.3.rs-6330456/v1

Source code

method: Baseline TESTR Finetuned2025-04-29

Authors: Organizers

Affiliation: ICDAR’25 RRC-MapText

Description: TESTR checkpoint (polygon prediction head, TotalText tuned) is further finetuned on the competition training data available for each data set.

Ranking Table

Description Paper Source Code

		Overall	Words
Date	Method	H-Mean	Precision	Recall	Tightness	Char Accuracy	Char Quality	Det Quality	F-Score
2025-03-31	MapText Strong Pipeline	91.13%	95.88%	91.84%	83.75%	94.04%	73.89%	78.57%	93.82%
2025-03-28	Self-Sequencer	90.30%	91.52%	89.13%	86.14%	94.86%	73.79%	77.79%	90.31%
2025-04-29	Baseline TESTR Finetuned	89.53%	89.14%	90.04%	86.28%	92.92%	71.82%	77.30%	89.59%
2025-01-10	[Baseline MapText '24] MapText Detection and Recognition Strong Pipeline	89.26%	96.16%	85.01%	83.27%	93.97%	70.61%	75.14%	90.24%
2025-01-10	[Baseline MapText '24] MapTest	87.37%	90.47%	88.23%	81.82%	89.51%	65.42%	73.09%	89.34%
2025-04-19	CREPE + BezierCurve	84.93%	87.10%	86.53%	73.62%	95.47%	61.02%	63.91%	86.81%
2025-01-10	[Baseline MapText'24] MapTextSpotter	84.52%	92.61%	81.51%	81.44%	83.46%	58.94%	70.62%	86.71%
2025-01-10	[Baseline MapText'24] DS-LP	77.55%	71.76%	78.93%	71.63%	90.83%	48.90%	53.84%	75.17%
2025-01-10	[Baseline MapText'24] TESTR Checkpoint	74.61%	71.85%	66.90%	79.55%	82.10%	45.26%	55.12%	69.29%
2025-04-20	Word-Level Text Detection and Recognition on Historical Maps Using Preprocessing and PaddleOCR	59.46%	59.54%	40.01%	73.89%	83.71%	29.60%	35.36%	47.86%
2025-04-19	YOLOv8_ViTAE_PolygonDetector	27.42%	61.75%	9.64%	75.97%	77.96%	9.88%	12.67%	16.68%

Inactive evaluations

method: MapText Strong Pipeline2025-03-31

method: Self-Sequencer2025-03-28

method: Baseline TESTR Finetuned2025-04-29

Ranking Table

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic