method: M4C (single model)2019-11-02

Authors: Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Affiliation: Facebook AI Research (FAIR); University of California, Berkeley

Email: ronghang.hu@gmail.com

Description: We propose a novel model for the TextVQA task based on a multimodal transformer architecture with iterative answer prediction and rich feature representations for OCR tokens, largely outperforming previous work on three datasets.