Tasks - Document Information Localization and Extraction

The DocILE benchmark consists of two tracks with the tasks shortly introduced below.
For both tracks, the inputs are business documents in PDF format. The DocILE repository provides tools to work with the dataset, load documents as images, use pre-computed OCR, etc.
For full detail, please refer to the DocILE dataset paper.

Track 1: Key Information Localization and Extraction (KILE)

The goal of the first track is to localize key information of pre-defined categories (field types) in the document. We focus the benchmark on detecting semantically important values corresponding to tens of different field types rather than fine-tuning the underlying text recognition.

Towards this focus, we provide word-level text detections for each document, we choose an evaluation metric (below) that does not pay attention to the text recognition part, and we simplify the task in the challenge by only requiring correct localization of the values in the documents in the primary metric. Text extractions are checked, besides the locations and field types, in a separate evaluation – the leaderboard ranking does not depend on it. Any post-processing of values – deduplication, converting dates to a standardized format etc., despite being needed in practice, is not performed. With the simplifications, the main task can also be viewed as a detection problem. Note that when several instances of the same field type are present, all of them should be detected.

Evaluation Metric: Since the task is framed as a detection problem, the standard Average Precision metric is used as the main evaluation metric. Unlike the common practice in object detection, where true positives are determined by thresholding the Intersection-over-Union, we use a different criterion tailored to evaluate the usefulness of detections for text read-out. Inspired by the CLEval metric used in text detection, we measure whether the predicted area contains nothing but the related character centers. Since character-level annotations are hard to obtain, we use CLEval definition of Pseudo-Character Center (PCC).

Track 2: Line Item recognition (LIR)

The goal of the second track is to localize key information of pre-defined categories (field types) and group it into line items. A Line Item (LI) is a tuple of fields (e.g., description, quantity, and price) describing a single object instance to be extracted, e.g., a row in a table.

Evaluation Metric: The main evaluation metric is the micro F1 score over all line item fields. A predicted line item field is correct if it fulfills the requirements from Track 1 (on field type and location) and if it is assigned to the correct line item. Since the matching of ground truth (GT) and predicted line items may not be straightforward due to errors in the prediction, our evaluation metric chooses the best matching in two steps: for each pair of predicted and GT line items, the predicted fields are evaluated as in Track 1, the maximum matching is found between predicted and GT line items, maximizing the overall recall.

Benchmark Dataset Rules

All submissions to the benchmark must comply with the following rules:

Submission is done via the Robust Reading Competition (RRC) portal.
The use of external document datasets is prohibited along with models trained on these datasets, in order to focus on clear comparative evaluation of methods that use the provided collection of labeled and unlabeled documents. Other external datasets and checkpoints (ImageNet, Wikipedia text corpuses, ...) can be used as long as they are publicly available.
The training and validation sets can be used in any way for model training (e.g., using a custom training/validation split is allowed).
Predictions on the test set cannot be annotated and/or edited manually.
Test documents predictions are not required to be mutually independent (e.g., it is allowed to adjust predictions based on the statistics over all test predictions).

To make a submission valid for the DocILE'23 competition and make it eligible for prizes, refer to the full competition rules.

Challenge News

Important Dates

Test set published and submissions open: 24 Apr 2023

Benchmark submission deadline: 24 May 2023

Working note submission deadline: 5 June 2023

Notification of working note acceptance: 23 June 2023

Note: DocILE 2023 runs simultaneously as a CLEF 2023 Lab and follows CLEF 2023 schedule: https://clef2023.clef-initiative.eu/index.php?page=Pages/schedule.html

Working note papers: On top of providing method description for the submission on RRC website, working note must be submitted to organizers as explained in the competition rules: https://docile.rossum.ai/static/docile_rules_and_prize_eligibility.pd