method: M4C (single model)2019-11-02
Authors: Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach
Affiliation: Facebook AI Research (FAIR); University of California, Berkeley
Email: ronghang.hu@gmail.com
Description: We propose a novel model for the TextVQA task based on a multimodal transformer architecture with iterative answer prediction and rich feature representations for OCR tokens, largely outperforming previous work on three datasets.