Method: M4C (single model) - Task 3 - Open Dictionary - ICDAR 2019 Robust Reading Challenge on Scene Text Visual Question Answering

method: M4C (single model)2019-11-02

Authors: Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Affiliation: Facebook AI Research (FAIR); University of California, Berkeley

Description: We propose a novel model for the TextVQA task based on a multimodal transformer architecture with iterative answer prediction and rich feature representations for OCR tokens, largely outperforming previous work on three datasets.

R. Hu, A. Singh, T. Darrell, M. Rohrbach, Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA. arXiv preprint arXiv:1911.06258, 2019 (to appear in CVPR 2020)

Source code