Tasks - Document Visual Question Answering

The challenge will comprise two tasks. On one hand, a typical VQA task, where natural language questions are defined over single documents, and an answer needs to be generated by interpreting the document image. No list of pre-defined responses will be given, hence the problem cannot be easily treated as an n-way classification task. On the other hand, a retrieval-style task where given a question, the aim is to identify and retrieve all the documents in a large document collection that are relevant to answering this question.