Tasks - ICDAR 2024 Competition on Handwriting Recognition of Historical Ciphers
Data
For this competition we will use approx. 600 pages of cipher manuscripts. Given the variability of alphabets, we will divide them into:
-
Ciphers with digits. These documents are found at the Secret Archive of the Vatican, and belong to different centuries. These Vatican ciphers are written using 76 different symbols, based mostly on digits with multiple diacritics.
-
Ciphers with symbol alphabets. These documents are written using an alphabet of symbols, most of them invented. These documents include:
-
The Borg cipher is a long encrypted manuscript from the 17th century. All the manuscript is encoded, except the first and last two pages, and some headings in Latin. The cipher consists of 34 different symbols, from graphic signs to Latin letters and some diacritics. The cipher has touching symbols. The plaintext language - before encryption - is Latin.
-
The Copiale cipher is a long encrypted manuscript from the mid 18th century. The cipher consists of 100 different symbols, including symbols from Latin and Greek alphabets, as well as some ideograms (graphic symbols) that represent special entities. The plaintext language - before encryption - is German.
-
Enciphered documents from the Bibliotheque Nationale de France. Those 16th Century letters are fully in cipher, from various French nobles from the time of French King Francois I. There are about 37 distinct types of graphical symbols, generally non-touching. The plaintext - before encryption - is French.
-
The Ramanacoil manuscript is a document from 1674, kept in the National Archives of the Netherlands. It employs 24 unique symbols for the Latin alphabet (but without V and J), additional special symbols for double letters (EE, FF, LL, OO, and PP), and special symbols for seven important words (e.g. “Ramanacoil”). Symbols generally don’t touch. The plaintext language - before encryption - is Dutch.
-
For the competition, each cipher page has been segmented into text lines, and manually transcribed. Thus, each textline image has its corresponding transcription file.
Tasks
With the aim to facilitate the participation of the maximum number of researchers, we propose several tasks:
- Task 1 (Level Easy): alphabet of digits, sufficient amount of pages for training.
- Task 2A Borg (Level Medium): alphabet of symbols, sufficient amount of pages for training.
- Task 2B Copiale (Level Medium): alphabet of symbols, sufficient amount of pages for training.
- Task 3A BNF (Level Difficult): alphabet of symbols, few pages for training.
- Task 3B Ramanacoil (Level Difficult): alphabet of symbols, few pages for training.
Participants can freely decide to participate in one or more tasks.
Evaluation
The evaluation will be carried out at line level. Given that cipher texts avoid grouping symbols into words to make the deciphering more difficult, the evaluation will be based on the Character Error Rate (CER).
Challenge News
Important Dates
10 January 2024: Competition Announced
20 January 2024: Training data released
18 February 2024: Test data released
10 May 2024: Submission of results