Tasks - ICDAR 2025 Competition on Historical Map Text Detection, Recognition, and Linking

Tasks

The competition consists of multiple inter-related tasks on historical maps, including text detection and recognition at word and phrase levels. The four primary competition tasks are,

  1. Word detection
  2. Phrase detection (word linking)
  3. Word Detection and Recognition
  4. Phrase Detection and Recognition

We will evaluate each task on two datasets. One dataset consists map images from the David Rumsey Map Collection, covering a wide range of map styles, and the other dataset from French Land Registers contains maps tailored to a specific place and time. See details in the Downloads page

File Formats

All tasks share the same file formats for either ground truth and submissions, though certain fields or elements might be ignored when irrelevant for the task.

Note that all coordinates are given in pixels with respect to the image, starting at (0,0) in the top-left corner.

Ground Truth Format

The same ground truth file and format is used for all tasks: a list of dictionaries (one per image), each of which has a list of phrase groups, consisting of an ordered list of words. The JSON file (UTF-8 encoded) has the following format:

[ # Begin a list of images
    {
     "image": "IMAGE_NAME1",
     "groups": [ # Begin a list of phrase groups for the image
         [  # Begin a list of words for the phrase
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT1", "illegible": False, "truncated": False},
           ...,
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT2", "illegible": True, "truncated": False}
         ],
          ...
         [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT3", "illegible": False, "truncated": True}, ... ]
     ] },
    {
     "image": "IMAGE_NAME2",
     "groups": [
         [
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT4", "illegible": False, "truncated": False},
           ...,
           {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT5", "illegible": False, "truncated": False}],
          ...
         [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT6", "illegible": False, "truncated": False}, ... ] 
     ] },
     ...
]

(Small sample file.) For each image, the "groups" field stores a list of groups, where each entry represents the list of words within a group, given in reading order (for task 4). The "vertices" field of each word stores a list of coordinates (number pairs) representing the vertices of a bounding polygon for the given word; there must be at least three points and no specific arrangement is required otherwise.


For detection tasks (1 and 2), the "text" field of the words can be ignored. For non-grouping tasks (1 and 3), the grouping structure can be ignored and only the lists of words are needed.

Words that are marked truncated or illegible will be ignored in the evaluation. For linking tasks (2 and 4), groups that contain any ignored word will be ignored in the evaluation.

Submission Format

All tasks use the same basic submission format (and can accept the same file): a list of dictionaries (one per image), each of which has a list of phrase groups, consisting of an ordered list of predicted words. Some fields and structures will be ignored in certain tasks when they are irrelevant. The JSON file (UTF-8 encoded) has the following format:

[ # Begin a list of images
    {
     "image": "IMAGE_NAME1",
     "groups": [ # Begin a list of phrase groups for the image
        [ # Begin a list of words for the phrase
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT1"},
          ...,
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT2"}
       ],
       ...
       [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT3}, ... ]
    ] },
    {
     "image": "IMAGE_NAME2",
     "groups": [
        [
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT4"},
          ...,
          {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT5"}
        ],
        ...
        [ {"vertices": [[x1, y1], [x2, y2], ..., [xN, yN]], "text": "TEXT6"}, ... ] 
    ] },
    ...
]

(Small sample file.) The "groups" field for each image stores a list of groups, where each entry represents the list of words within a group. The "vertices" field of each word stores a list of coordinates (number pairs) representing the vertices of a bounding polygon for the given word; there must be at least three points and no specific arrangement is required otherwise.

Note:

  • The membership of words in groups is only considered during evaluation of Tasks 2 and 4.
  • The "text" field is only considered during evaluation of Tasks 3 and 4.
  • The word order (within a group) is only considered during evaluation of Task 4.
  • The group order (within an image) is never considered.
  • The "image" field must contain the relative path to the image, like {"image": "rumsey/test/000001.png" ... } or {"image": "ign/test/000001.jpg" ... }

Task 1 - Word Detection

The task requires detecting individual words on map images, i.e., generating bounding polygons that enclose text instances at the word level.

Submission Format

The optional "text" field of each word is ignored in evaluation and not required in the JSON file; this allows the same file to be submitted for every task. Grouping is not required for this task; although the groups field is required in the JSON file, the group-level organization is ignored by the evaluation. For example, each image could have one group containing all the words or each word could belong to its own group.

Evaluation Metric

Coming soon.

 

Task 2 - Phrase Detection

This task requires words to be detected (with their polygon boundaries) and grouped into constituent lists for label phrases. Words from the same group (phrase) are treated as one unit for (joint) detection.

Submission Format

As in Task 1, the optional "text" field of each word is ignored in evaluation and not required in the JSON file; this allows the same file to be submitted for every task. However, the evaluation is sensitive to group-level organization among the words. The word order is not considered during evaluation.

Evaluation Metric

Coming soon.

Task 3 - Word Detection and Recognition

In this task, participants are expected to produce word-level text detection and recognition results, e.g., generating a set of word bounding polygons and corresponding transcriptions.

Submission Format

Submissions require the "text" field for each word. Like Task 1, grouping is not required for this task; although the groups field is required in the JSON file, any group-level organization is ignored by the evaluation. For example, each image could have one group containing all the words or each word could belong to its own group.

Evaluation Metric

Coming soon.

Task 4 - Phrase Detection and Recognition

This task requires the detection and recognition at the phrase level. Submissions must group words (polygons and transcriptions) into phrases, an ordered list.

Submission Format

All elements of the submission file format described above are required; "text" transcriptions and groupings are all considered in the evaluation.

Evaluation Metric

Coming soon.

Evaluation Metric Summary

 Task  Competition Metric  Other Metrics
 1-Word Detection  Coming soon.  Coming soon.
 2-Phrase Detection  Coming soon.  Coming soon.
 3-Word Detection and Recognition  Coming soon.  Coming soon.
 4-Phrase Detection and Recognition  Coming soon.  Coming soon.

 

FAQs

  1. What training data is allowed?
    • Can I use private data for the competition?
      • Yes, but only under two conditions:

        In particular, competitors must take great care to exclude any labeled training data that overlaps with the competition test data set. Historical maps are often printed in multiple editions from the same engraving or scanned into a variety of digital libraries. Entries whose labeled training data is discovered to contain a test map will be disqualified.

        1. The use is disclosed with the submission, and
        2. The data is made public for the benefit of the community. To be included in the competition, link(s) to the data must be included with any submission using private data.
    • Can I use synthetic data for the competition?
      • Yes; submitters are strongly encouraged to share their data and/or synthesis method.
    • Can I use labeled public data sets (i.e., ICDAR13, COCOText, ArT, etc.)?
      • Yes; submitters are encouraged to indicate any training data sets used.
    • Can I use publicly available unlabeled data?
      • Yes.
  2. Does my submission have to include both data sets (Rumsey and French) for evaluation?
    • No, the two data sets will be evaluated separately. Omitting one will not influence evaluation of the other. While such "empty" results will necessarily appear in the online evaluation, they will be manually excluded from a competition report.
  3. What is the character set for the data?
    • The Latin character set (including diacritics and digraphs), numerals, punctuation, common keyboard symbols, and select special symbols such as ™ and ®/©.

      Maps from the Rumsey data set are expected to be in English, but whereas they cover world-wide historical geography, occasional diacritics (e.g., ü or ø) and digraphs (e.g., æ) are to be expected. The annotations for the 19th century Cadastre maps from the French data were restricted to the following charset: "abcdefghijklmnopqrstuvwxyzàâçéèêëîïôùûüœÿABCDEFGHIJKLMNOPQRSTUVWXYZÀÂÇÉÈÊËÎÏÔÙÛÜŒŸ0123456789'.,-+/*()&=#".
  4. How many submissions can I make?
    • For the competition, each participant may submit as many times as they like, but only the final pre-deadline submission will be included for the competition report and ranking.

Challenge News

Important Dates

2024-12-10: Website live, train/val sets available

2025-03-01: Test set available, submissions are open

2024-04-01: Final submission deadline, including short reports