Overview - Document UnderstanDing of Everything 😎

The DUDE challenge seeks to foster research on document understanding in a real-world setting with potential distribution shifts between training and test splits. In contrast to previous datasets, we extensively source multi-domain, multi-purpose, and multi-page documents of various types, origins, and dates. Importantly, we bridge the yet unaddressed gap between Document Layout Analysis (DLA) and Question Answering (QA) paradigms by introducing complex layout-navigating questions and unique problems that often demand advanced information processing or multi-step reasoning.

Untitled Diagram.jpg

What differentiates DUDE from the previous ICDAR shared tasks? 

Similar to previous work in DocVQA [1, 2], we evaluate models as a natural language interface to VRDs (Visually-Rich Documents), which allows generalizing to various tasks and domains. Although DUDE resembles these and other QA datasets, the differences are clear during a point-by-point comparison.

  • Documents. Current QA datasets collected relatively homogeneous documents, such as invoices or financial reports [3], or documents restricted to one or a few domains. DUDE covers a broad spectrum of types, domains, sources, and dates representative of a modern document flow. Additionally, DUDE is not restricted to single-page document excerpts, which allows measuring the ability to process long inputs prevalent in business applications. 
  • Questions. In addition to conventional document QA, we formulate questions requiring comprehension beyond the document content, such as ‘how many text columns are there?’, ‘does the document contain words with diacritics?’ or ‘which page contains the largest table in the document?’. Moreover, we provide questions that require arithmetic and comparison operations on numbers and dates or multi-hop questions that evaluate the model’s robustness on sequential, step-by-step reasoning.
  • Answers. To cover all possible business use cases, we opted to provide both abstractive and extractive answers. These cover various answer types, such as textual, numerical, dates, yes/no, lists, or non-answerable. The latter demands that the model correctly identifies that the answer cannot be provided, as the question needs to be better formed, e.g., it asks about the value of an empty cell in the table.


[1] Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C.V. Jawahar, "Document Visual Question Answering Challenge 2020", DAS 2020

[2] Rubèn Tito, Minesh Mathew, C.V. Jawahar, Ernest Valveny, Dimosthenis Karatzas, "ICDAR 2021 Competition on Document Visual Question Answering", ICDAR 2021

[3] Zhu, F., Lei, W., Feng, F., Wang, C., Zhang, H. and Chua, T.S., 2022, October. Towards complex document understanding by discrete reasoning. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 4857-4866).

Important Dates

Registration open: Christmas 2022

Competition Q&A period: 20 December - 10 January

Sample dataset available: 6 January 2023

Training/validation dataset available: 30 January 2023

Test set submissions open (Task 1 & evaluation phase2): 9 March 2023

General submission deadline: 20 April 2023

Method description submissions deadline: 20 April 2023 

Notification to authors: 1 May 2023

All dates are 23:59 AoE and subject to change.


Note on the registration for the DUDE challenge:

There is no need to register explicitly for the DUDE challenge. As long as you are registered to the RRC portal you will be able to submit your results when the submission is open.

Any questions, contact the DUDEs at duchallenge.icdar2023@gmail.com