method: ssbaseline2020-09-09
Authors: Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu
Affiliation: Northwestern Polytechnical University
Email: zephyrzhuqi@gmail.com
Description: We wish this work to set the new baseline for these two OCR text related applications and to inspire new thinking of multi-modality encoder design.
method: TIG2020-08-15
Authors: Xiangpeng Li
Description: Text-Instance Graph: We build an OCR-Obj graph using overlapping relationships between OCR token texts and visual instances in the image. Then question conditioned multi-step graph attention network is adopted to extend the perception of each node, which makes the node is described by their neighboring nodes.
method: M4C (single model)2019-11-02
Authors: Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach
Affiliation: Facebook AI Research (FAIR); University of California, Berkeley
Email: ronghang.hu@gmail.com
Description: We propose a novel model for the TextVQA task based on a multimodal transformer architecture with iterative answer prediction and rich feature representations for OCR tokens, largely outperforming previous work on three datasets.
Date | Method | Score | |||
---|---|---|---|---|---|
2020-09-09 | ssbaseline | 0.5500 | |||
2020-08-15 | TIG | 0.5051 | |||
2019-11-02 | M4C (single model) | 0.4621 | |||
2019-04-29 | Focus: A bottom-up approach for Scene Text VQA | 0.0882 |