Results - ICDAR 2025 Competition on Historical Map Text Detection, Recognition, and Linking

Authors: Yu Xie, Canhui Xu, Jielei Zhang, Pengyu Chen, Weihang Wang, Yuchen He, Peiyi Li, Yihan Meng, Longwen Gao, Qian Qiao

Affiliation: Bilibili Inc., QUST

Description: For the English MapText detection task, we employed DNTextSpotter, a novel denoising training method based on DeepSolo. For the Chinese MapText detection task, we utilized DeepSolo. Data augmentation techniques, including cropping, scaling, and adjustments to saturation and contrast, were applied. Pre-training was conducted using available real-world datasets such as TextOCR, TotalText, IC15, and MLT2017. Post-processing methods were also adopted.

@article{xie2024dntextspotter, title={DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training}, author={Xie, Yu and Qiao, Qian and Gao, Jun and Wu, Tianxiang and Fan, Jiaqing and Zhang, Yue and Zhang, Jielei and Sun, Huyang}, journal={arXiv preprint arXiv:2408.00355}, year={2024} }

@inproceedings{ye2023deepsolo, title={Deepsolo: Let transformer decoder with explicit points solo for text spotting}, author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19348--19357}, year={2023} }

Source code

method: Self-Sequencer2025-03-28

Authors: Mengjie Zou, Tianhao Dai, Remi Petitpierre, Beatrice Vaienti, Frederic Kaplan, Isabella di Lenardo

Affiliation: EPFL, Swiss Federal Institute of Technology in Lausanne

Email: remi.petitpierre@epfl.ch

Description: For word detection and recognition, our approach relies on DeepSolo, whose architecture is derived from Detection Transformers (DETR). In short, DeepSolo extracts hierarchical visual features from map images and processes them through an encoder-decoder architecture to detect words as segments bounded by Bézier curves. The model specifically returns four control points of central Bézier curves per word and then uniformly samples query points along these curves to segment, classify, and delineate each text instance precisely. To resolve duplicate word detections, we implement a postprocessing step inspired by Non-Maximum Suppression. It involves calculating the Fréchet distance between the Bézier curves of potential duplicate word pairs, or "directional synonyms", and merging those below a defined threshold. More details on the model, algorithms, and specific implementation are provided in our separate article [1].

The model training leverages several real and synthetical datasets: ICDAR MapText [2], MapKuratorHuman [3], SynthMap [3], and Paris and Jerusalem Maps Text Dataset [4].

References:

[1] Zou, M., Dai, T., Petitpierre, R., Vaienti, B., Kaplan, F., & di Lenardo I. (2025). Recognizing and Sequencing Multi-word Texts in Maps Using an Attentive Pointer.

[2] Lin, Y., Li, Z., Chiang Y.Y., & Weinman J. (2024). Rumsey Train and Validation Data for ICDAR'24 MapText Competition (Version 1.3). Zenodo. https://doi.org/10.5281/zenodo.11516933

[3] Kim, J., Li, Z., Lin Y., Namgung, M., Jang, L., & Chiang Y.Y. (2023) The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps. In: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems. https://arxiv.org/abs/2306.17059

[4] Dai, T., Johnson, K., Petitpierre, R., Vaienti, B., & di Lenardo, I. (2025). Paris and Jerusalem Maps Text Dataset (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14982662

Zou M., Dai T., Petitpierre R., Vaienti B., Kaplan F., di Lenardo I. (2025). Recognizing and Sequencing Multi-word Texts in Maps Using an Attentive Pointer. In Review. doi: 10.21203/rs.3.rs-6330456/v1

Source code

method: [MapText'24 Baseline] MapText Detection Strong Pipeline2025-01-09

Authors: Yu Xie, Jielei Zhang, Ziyue Wang, Yuchen He, Yihan Meng, Weihang Wang, Peiyi Li, Longwen Gao, Qian Qiao

Affiliation: Bilibili Inc.

Description: In the detection task of MapText, we employed ViTAE-v2 to extract global features, utilizing an encoder-decoder network architecture (DeepSolo). Data augmentation techniques such as cropping, scaling, saturation, and contrast adjustment were applied. Pre-training was conducted using available real datasets (TextOCR, TotalText, IC15, MLT2017). Post-processing methods were also adopted.

Ranking Table

Description Paper Source Code

		Overall	Words
Date	Method	H-Mean	Precision	Recall	Tightness	Quality	F-Score
2025-03-31	MapText Strong Pipeline	90.20%	95.88%	91.84%	83.75%	78.57%	93.82%
2025-03-28	Self-Sequencer	88.88%	91.52%	89.13%	86.14%	77.79%	90.31%
2025-01-09	[MapText'24 Baseline] MapText Detection Strong Pipeline	88.70%	94.19%	89.92%	82.75%	76.13%	92.01%
2025-04-29	Baseline TESTR Finetuned	88.46%	89.14%	90.04%	86.28%	77.30%	89.59%
2025-04-20	PolyTextTR	86.72%	90.28%	87.78%	82.48%	73.42%	89.01%
2025-03-27	MapTextSpotter	84.88%	92.61%	81.51%	81.45%	70.62%	86.71%
2025-04-19	CREPE + BezierCurve	81.92%	87.10%	86.53%	73.62%	63.91%	86.81%
2025-04-19	YOLOv8-ViTAE-Polygon	60.36%	54.10%	56.28%	74.37%	41.03%	55.16%
2025-04-20	Word-Level Text Detection on Historical Maps Using Multi-Stage Preprocessing and PaddleOCR	54.23%	59.54%	40.01%	73.89%	35.36%	47.86%
2025-03-27	MapTextSpotter	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%

Inactive evaluations

method: MapText Strong Pipeline2025-03-31

method: Self-Sequencer2025-03-28

method: [MapText'24 Baseline] MapText Detection Strong Pipeline2025-01-09

Ranking Table

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic