method: TWA2022-03-15

Authors: Zan-Xia Jin, Mike Zheng Shou, Fang Zhou, Satoshi Tsutsui, Jingyan Qin, Xu-Cheng Yin

Affiliation: University of Science and Technology Beijingļ¼Œ National University of Singapore

Description: We propose an OCR Token-Word Contrastive (TWC) learning task, which pre-trains word representation by augmenting OCR tokens via the Levenshtein distance between the OCR tokens and words in a dictionary.

method: ssbaseline2020-09-09

Authors: Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Affiliation: Northwestern Polytechnical University

Email: zephyrzhuqi@gmail.com

Description: We wish this work to set the new baseline for these two OCR text related applications and to inspire new thinking of multi-modality encoder design.

method: TIG2020-08-15

Authors: Xiangpeng Li

Description: Text-Instance Graph: We build an OCR-Obj graph using overlapping relationships between OCR token texts and visual instances in the image. Then question conditioned multi-step graph attention network is adopted to extend the perception of each node, which makes the node is described by their neighboring nodes.

Ranking Table

Description Paper Source Code
DateMethodScore
2022-03-15TWA0.5774
2020-09-09ssbaseline0.5500
2020-08-15TIG0.5051
2019-11-02M4C (single model)0.4621
2019-04-30VTA0.2820
2019-04-30QAQ0.2563
2019-04-22Clova AI OCR0.2155
2019-04-29USTB-TQA0.1702
2019-04-29USTB-TVQA0.0952
2019-04-29Focus: A bottom-up approach for Scene Text VQA0.0882

Ranking Graphic