Tasks - ICDAR 2025 Competition on Historical Map Text Detection, Recognition, and Linking

The competition consists of multiple inter-related tasks on historical maps, including text detection and recognition at word and phrase levels. The four primary competition tasks are,

  1. Word Detection
  2. Phrase Detection (Word Detection and Linking)
  3. Word Detection and Recognition
  4. Phrase Detection and Recognition

We will evaluate each task on up to 3 datasets (see details in the Downloads page):

  1. map images from the David Rumsey Map Collection, covering a wide range of map styles,
  2. map images from historical French Land Registers, focusing on a specific France area over the course of the 19th century,
  3. map images from the Taiwan Historical Maps System, published during the first half of the 20th century (1900–1960) and featuring traditional Chinese characters.

File Formats

All tasks share the same file formats for either ground truth and submissions, though certain fields or elements might be ignored when irrelevant for the task.

Note that all coordinates are given in pixels with respect to the image, starting at (0,0) in the top-left corner.

Ground Truth Format

The same ground truth file and format is used for all tasks: a list of dictionaries (one per image), each of which has a list of phrase groups, consisting of an ordered list of words. The JSON file (UTF-8 encoded) has the following format:

[ # Begin a list of images
    {
     "image": "IMAGE_NAME1",
     "groups": [ # Begin a list of phrase groups for the image
         [  # Begin a list of words for the phrase
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT1", "illegible": False, "truncated": False},
           ...,
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT2", "illegible": True, "truncated": False}
         ],
          ...
         [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT3", "illegible": False, "truncated": True}, ... ]
     ] },
    {
     "image": "IMAGE_NAME2",
     "groups": [
         [
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT4", "illegible": False, "truncated": False},
           ...,
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT5", "illegible": False, "truncated": False}],
          ...
         [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT6", "illegible": False, "truncated": False}, ... ] 
     ] },
     ...
]

(Small sample file.) For each image, the "groups" field stores a list of groups, where each entry represents the list of words within a group, given in reading order (for task 4). The "vertices" field of each word stores a list of coordinates (number pairs) representing the vertices of a bounding polygon for the given word; there must be at least three points and no specific arrangement is required otherwise.

For detection tasks (1 and 2), the "text" field of the words can be ignored. For non-grouping tasks (1 and 3), the grouping structure can be ignored and only the lists of words are needed.

Words that are marked truncated or illegible will be ignored in the evaluation. For linking tasks (2 and 4), groups that contain any ignored word will be ignored in the evaluation.

Submission Format

All tasks use the same basic submission format (and can accept the same file): a list of dictionaries (one per image), each of which has a list of phrase groups, consisting of an ordered list of predicted words. Some fields and structures will be ignored in certain tasks when they are irrelevant. The JSON file (UTF-8 encoded) has the following format:

[ # Begin a list of images
    {
     "image": "IMAGE_NAME1",
     "groups": [ # Begin a list of phrase groups for the image
        [ # Begin a list of words for the phrase
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT1"},
          ...,
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT2"}
       ],
       ...
       [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT3}, ... ]
    ] },
    {
     "image": "IMAGE_NAME2",
     "groups": [
        [
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT4"},
          ...,
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT5"}
        ],
        ...
        [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT6"}, ... ] 
    ] },
    ...
]

(Small sample file.) The "groups" field for each image stores a list of groups, where each entry represents the list of words within a group. The "vertices" field of each word stores a list of coordinates (number pairs) representing the vertices of a bounding polygon for the given word; there must be at least three points and no specific arrangement is required otherwise.

Note:

  • The membership of words in groups is only considered during evaluation of Tasks 2 and 4.
  • The "text" field is only considered during evaluation of Tasks 3 and 4.
  • The word order (within a group) is only considered during evaluation of Tasks 2 and 4.
  • The group order (within an image) is never considered.
  • The "image" field must contain the relative path to the image, like {"image": "rumsey/test/000001.png" ... } or {"image": "ign/test/000001.jpg" ... }

Task 1 - Word Detection

The task requires detecting individual words on map images, i.e., generating bounding polygons that enclose text instances at the word level.

Submission Format

The optional "text" field of each word is ignored in evaluation and not required in the JSON file; this allows the same file to be submitted for every task. Grouping is not required for this task; although the groups field is required in the JSON file, the group-level organization is ignored by the evaluation. For example, each image could have one group containing all the words or each word could belong to its own group.

Evaluation Metric

Detected word regions and ground truth elements are optimally matched with a minimum IoU requirement (0.5). The correspondence will result in a set of true positives—word regions where IoU>0.5—from which we will calculate recall R, precision P, and tightness T (the average IoU among true positives). The competition metric will be the harmonic mean of recall, precision, and tightness. 

Note that this differs slightly from the 2024 competition metric.

Task 2 - Phrase Detection

This task requires words to be detected (with their polygon boundaries) and grouped into constituent lists for label phrases. 

Submission Format

As in Task 1, the optional "text" field of each word is ignored in evaluation and not required in the JSON file; this allows the same file to be submitted for every task. However, the evaluation is sensitive to group-level organization among the words; the word order is considered during evaluation. This is because the order is used to determine the links or edges between words, which will be evaluated.

Evaluation Metric

Detected word regions and ground truth words are optimally matched as in Task 1. In addition to calculating the same statistics of words (precision, recall, and tightness), we evaluate the implied links (edges) between the detected words. True positive links are where

  1. the two endpoint words are both considered true positive matches,
  2. the constituent ground truth words have a direct link (adjacent within the same group), and
  3. the constituent detected words have a direct link (adjacent within the same group).

Unmatched ground truth links are considered false negatives and unmatched predicted links are considered false positives; from these we calculate link recall R and precision PL. The competition metric will be the harmonic mean among the five elements word recall, precision, and tightness, and link recall and precision.

Note that this differs substantially from the 2024 competition metric, which did not directly evaluate links but instead implicitly evaluated links by making matches at the group level.

Task 3 - Word Detection and Recognition

In this task, participants are expected to produce word-level text detection and recognition results, e.g., generating a set of word bounding polygons and corresponding transcriptions.

Submission Format

Submissions require the "text" field for each word. Like Task 1, grouping is not required for this task; although the groups field is required in the JSON file, any group-level organization is ignored by the evaluation. For example, each image could have one group containing all the words or each word could belong to its own group.

Evaluation Metric

As in Tasks 1 and 2, detected word regions d are matched optimally to ground truths g; while only the IoU threshold criterion is required for matching, the correspondences are biased to prefer agreement between text strings. In addition to the same word recall and precision, the character accuracy is measured as the average complementary normalized edit distance between the corresponding elements (denoted as the set of true positives, TP):

MapText-eq6.png

The competition metric is thus the harmonic mean between word recall R, precision P, tightness T, and character accuracy C.

Note that this differs from the 2024 competition, where an exact string match was required for detections and ground truth to be considered for correspondence (in addition to the IoU threshold). 

Task 4 - Phrase Detection and Recognition

This task requires the detection and recognition at the phrase level. Submissions must group words (polygons and transcriptions) into phrases, an ordered list.

Submission Format

All elements of the submission file format described above are required; "text" transcriptions and groupings, as well as the word ordering, are all considered in the evaluation.

Evaluation Metric

Correspondences between predicted and ground truth words are found as in Task 3. Links are calculated as in Task 2. The competition metric is the harmonic mean among all measurements: word recall, precision, tightness, and character accuracy, as well as link recall and precision.

Note this also differs from the 2024 competition in several respects. As with Task 2, matches are no longer made at the group level but at the word level, and links are explicitly evaluated.

Evaluation Metric Summary

The following table identifies the constituent terms of the harmonic mean used for competition ranking.

 Task  Words  Links
Metric P R T* C PL RL
 1-Word Detection      
 2-Phrase Detection  
 3-Word Detection and Recognition    
 4-Phrase Detection and Recognition

 

* Note that tightness is not included in the IGN French data set evaluation for competition due to high annotator variability on short cursive words.

FAQs

  1. What training data is allowed?
    • Can I use private data for the competition?
      • Yes, but only under two conditions:

        In particular, competitors must take great care to exclude any labeled training data that overlaps with the competition test data set. Historical maps are often printed in multiple editions from the same engraving or scanned into a variety of digital libraries. Entries whose labeled training data is discovered to contain a test map will be disqualified.

        1. The use is disclosed with the submission, and
        2. The data is made public for the benefit of the community. To be included in the competition, link(s) to the data must be included with any submission using private data.
    • Can I use synthetic data for the competition?
      • Yes; submitters are strongly encouraged to share their data and/or synthesis method.
    • Can I use labeled public data sets (i.e., ICDAR13, COCOText, ArT, etc.)?
      • Yes; submitters are encouraged to indicate any training data sets used.
    • Can I use publicly available unlabeled data?
      • Yes, except for data included in the test set.
  2. Does my submission have to include all data sets (Rumsey, French Land Register, Taiwanese Maps) for evaluation?
    • No, each data set will be evaluated separately. Omitting one will not influence evaluation of the other. While such "empty" results will necessarily appear in the online evaluation, they will be manually excluded from a competition report.
  3. What is the character set for the data?
    • The Latin character set (including diacritics and digraphs), numerals, punctuation, common keyboard symbols, and select special symbols such as ™ and ®/©.
    • The annotations for the maps from the Rumsey Map Collection are expected to be in English, but whereas they cover world-wide historical geography, occasional diacritics (e.g., ü or ø) and digraphs (e.g., æ) are to be expected.
    • The annotations for the 19th century Cadastre maps from the French data were restricted to the following charset: "abcdefghijklmnopqrstuvwxyzàâçéèêëîïôùûüœÿABCDEFGHIJKLMNOPQRSTUVWXYZÀÂÇÉÈÊËÎÏÔÙÛÜŒŸ0123456789'.,-+/*()&=#".
    • The annotations for the Taiwanese maps contain traditional Chinese characters falling in the following unicode ranges (otherwise ``illegible'' of the text instance is set to True): 0x3400-0x4DBF, 0x4E00-0x9FFF, 0xF900-0xFAFF, 0x20000-0x2A6DF, 0x2A700-0x2EBEF
  4. How many submissions can I make?
    • For the competition, each participant may submit as many times as they like, but only the final pre-deadline submission will be included for the competition report and ranking.

Challenge News

Important Dates

2024-12-10: Website live, train/val sets available

2025-03-01: Test set available, submissions are open

2024-04-01: Final submission deadline, including short reports