Overview - Document UnderstanDing of Everything 😎
The DUDE challenge seeks to foster research on document understanding in a real-world setting with potential distribution shifts between training and test splits. In contrast to previous datasets, we extensively source multi-domain, multi-purpose, and multi-page documents of various types, origins, and dates. Importantly, we bridge the yet unaddressed gap between Document Layout Analysis (DLA) and Question Answering (QA) paradigms by introducing complex layout-navigating questions and unique problems that often demand advanced information processing or multi-step reasoning.
What differentiates DUDE from the previous ICDAR shared tasks?
Similar to previous work in DocVQA [1, 2], we evaluate models as a natural language interface to VRDs (Visually-Rich Documents), which allows generalizing to various tasks and domains. Although DUDE resembles these and other QA datasets, the differences are clear during a point-by-point comparison.
- Documents. Current QA datasets collected relatively homogeneous documents, such as invoices or financial reports , or documents restricted to one or a few domains. DUDE covers a broad spectrum of types, domains, sources, and dates representative of a modern document flow. Additionally, DUDE is not restricted to single-page document excerpts, which allows measuring the ability to process long inputs prevalent in business applications.
- Questions. In addition to conventional document QA, we formulate questions requiring comprehension beyond the document content, such as ‘how many text columns are there?’, ‘does the document contain words with diacritics?’ or ‘which page contains the largest table in the document?’. Moreover, we provide questions that require arithmetic and comparison operations on numbers and dates or multi-hop questions that evaluate the model’s robustness on sequential, step-by-step reasoning.
- Answers. To cover all possible business use cases, we opted to provide both abstractive and extractive answers. These cover various answer types, such as textual, numerical, dates, yes/no, lists, or non-answerable. The latter demands that the model correctly identifies that the answer cannot be provided, as the question needs to be better formed, e.g., it asks about the value of an empty cell in the table.
 Minesh Mathew, Ruben Tito, Dimosthenis Karatzas, R. Manmatha, C.V. Jawahar, "Document Visual Question Answering Challenge 2020", DAS 2020
 Rubèn Tito, Minesh Mathew, C.V. Jawahar, Ernest Valveny, Dimosthenis Karatzas, "ICDAR 2021 Competition on Document Visual Question Answering", ICDAR 2021
 Zhu, F., Lei, W., Feng, F., Wang, C., Zhang, H. and Chua, T.S., 2022, October. Towards complex document understanding by discrete reasoning. In Proceedings of the 30th ACM International Conference on Multimedia (pp. 4857-4866).
DUDE - Extended Submission Deadline
DUDE - submission instructions
DUDE - Test Set Available
DUDE - Updated submission schedule
DUDE - GitHub Repository for discussions
Updated training/validation set
DUDE: RRC submission system being upgraded
Training/validation dataset available
Do NOT use qq.com emails to register or contact us
Registration open: Christmas 2022
Competition Q&A period: 20 December - 10 January
Sample dataset available: 6 January 2023
Training/validation dataset available: 30 January 2023
Test set submissions open (Task 1 & evaluation phase2): 9 March 2023
General submission deadline: 20 April 2023
Method description submissions deadline: 20 April 2023
Notification to authors: 1 May 2023
All dates are 23:59 AoE and subject to change.
Note on the registration for the DUDE challenge:
There is no need to register explicitly for the DUDE challenge. As long as you are registered to the RRC portal you will be able to submit your results when the submission is open.
Any questions, contact the DUDEs at firstname.lastname@example.org