method: TWA2022-03-15

Authors: Zan-Xia Jin, Mike Zheng Shou, Fang Zhou, Satoshi Tsutsui, Jingyan Qin, Xu-Cheng Yin

Affiliation: University of Science and Technology Beijingļ¼Œ National University of Singapore

Description: We propose an OCR Token-Word Contrastive (TWC) learning task, which pre-trains word representation by augmenting OCR tokens via the Levenshtein distance between the OCR tokens and words in a dictionary.

method: ssbaseline2020-09-09

Authors: Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Affiliation: Northwestern Polytechnical University


Description: We wish this work to set the new baseline for these two OCR text related applications and to inspire new thinking of multi-modality encoder design.

method: TIG2020-08-15

Authors: Xiangpeng Li

Description: Text-Instance Graph: We build an OCR-Obj graph using overlapping relationships between OCR token texts and visual instances in the image. Then question conditioned multi-step graph attention network is adopted to extend the perception of each node, which makes the node is described by their neighboring nodes.

Ranking Table

Description Paper Source Code
2019-11-02M4C (single model)0.4621
2019-04-22Clova AI OCR0.2155
2019-04-29Focus: A bottom-up approach for Scene Text VQA0.0882

Ranking Graphic