- Task 1 - Detection
- Task 2 - Detection-Linking
- Task 3 - Detection-Recognition
- Task 4 - Detection-Recognition-Linking
method: MapText Strong Pipeline2025-03-31
Authors: Yu Xie, Canhui Xu, Jielei Zhang, Pengyu Chen, Weihang Wang, Yuchen He, Peiyi Li, Yihan Meng, Longwen Gao
Affiliation: Bilibili Inc., QUST
Description: For the English MapText detection task, we employed DNTextSpotter, a novel denoising training method based on DeepSolo. For the Chinese MapText detection task, we utilized DeepSolo. Data augmentation techniques, including cropping, scaling, and adjustments to saturation and contrast, were applied. Pre-training was conducted using available real-world datasets such as TextOCR, TotalText, IC15, and MLT2017. Post-processing methods were also adopted.
@inproceedings{ye2023deepsolo, title={Deepsolo: Let transformer decoder with explicit points solo for text spotting}, author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={19348--19357}, year={2023} }
method: Self-Sequencer2025-03-28
Authors: Mengjie Zou, Tianhao Dai, Remi Petitpierre, Beatrice Vaienti, Frederic Kaplan, Isabella di Lenardo
Affiliation: EPFL, Swiss Federal Institute of Technology in Lausanne
Email: remi.petitpierre@epfl.ch
Description: For word detection and recognition, our approach relies on DeepSolo, whose architecture is derived from Detection Transformers (DETR). In short, DeepSolo extracts hierarchical visual features from map images and processes them through an encoder-decoder architecture to detect words as segments bounded by Bézier curves. The model specifically returns four control points of central Bézier curves per word and then uniformly samples query points along these curves to segment, classify, and delineate each text instance precisely. To resolve duplicate word detections, we implement a postprocessing step inspired by Non-Maximum Suppression. It involves calculating the Fréchet distance between the Bézier curves of potential duplicate word pairs, or "directional synonyms", and merging those below a defined threshold. More details on the model, algorithms, and specific implementation are provided in our separate article [1].
The model training leverages several real and synthetical datasets: ICDAR MapText [2], MapKuratorHuman [3], SynthMap [3], and Paris and Jerusalem Maps Text Dataset [4].
References:
[1] Zou, M., Dai, T., Petitpierre, R., Vaienti, B., Kaplan, F., & di Lenardo I. (2025). Recognizing and Sequencing Multi-word Texts in Maps Using an Attentive Pointer.
[2] Lin, Y., Li, Z., Chiang Y.Y., & Weinman J. (2024). Rumsey Train and Validation Data for ICDAR'24 MapText Competition (Version 1.3). Zenodo. https://doi.org/10.5281/zenodo.11516933
[3] Kim, J., Li, Z., Lin Y., Namgung, M., Jang, L., & Chiang Y.Y. (2023) The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps. In: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems. https://arxiv.org/abs/2306.17059
[4] Dai, T., Johnson, K., Petitpierre, R., Vaienti, B., & di Lenardo, I. (2025). Paris and Jerusalem Maps Text Dataset (Version 1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14982662
method: [MapText'24 Baseline] MapText Detection Strong Pipeline2025-01-09
Authors: Yu Xie, Jielei Zhang, Ziyue Wang, Yuchen He, Yihan Meng, Weihang Wang, Peiyi Li, Longwen Gao, Qian Qiao
Affiliation: Bilibili Inc.
Description: In the detection task of MapText, we employed ViTAE-v2 to extract global features, utilizing an encoder-decoder network architecture (DeepSolo). Data augmentation techniques such as cropping, scaling, saturation, and contrast adjustment were applied. Pre-training was conducted using available real datasets (TextOCR, TotalText, IC15, MLT2017). Post-processing methods were also adopted.
Overall | Words | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Date | Method | H-Mean | Precision | Recall | Tightness | Quality | F-Score | |||
2025-03-31 | MapText Strong Pipeline | 90.20% | 95.88% | 91.84% | 83.75% | 78.57% | 93.82% | |||
2025-03-28 | Self-Sequencer | 88.88% | 91.52% | 89.13% | 86.14% | 77.79% | 90.31% | |||
2025-01-09 | [MapText'24 Baseline] MapText Detection Strong Pipeline | 88.70% | 94.19% | 89.92% | 82.75% | 76.13% | 92.01% | |||
2025-04-29 | Baseline TESTR Finetuned | 88.46% | 89.14% | 90.04% | 86.28% | 77.30% | 89.59% | |||
2025-04-20 | PolyTextTR | 86.72% | 90.28% | 87.78% | 82.48% | 73.42% | 89.01% | |||
2025-03-27 | MapTextSpotter | 84.88% | 92.61% | 81.51% | 81.45% | 70.62% | 86.71% | |||
2025-04-19 | CREPE + BezierCurve | 81.92% | 87.10% | 86.53% | 73.62% | 63.91% | 86.81% | |||
2025-04-19 | YOLOv8-ViTAE-Polygon | 60.36% | 54.10% | 56.28% | 74.37% | 41.03% | 55.16% | |||
2025-04-20 | Word-Level Text Detection on Historical Maps Using Multi-Stage Preprocessing and PaddleOCR | 54.23% | 59.54% | 40.01% | 73.89% | 35.36% | 47.86% | |||
2025-03-27 | MapTextSpotter | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |