method: TWA2022-03-15
Authors: Zan-Xia Jin, Mike Zheng Shou, Fang Zhou, Satoshi Tsutsui, Jingyan Qin, Xu-Cheng Yin
Affiliation: University of Science and Technology Beijingļ¼ National University of Singapore
Description: We propose an OCR Token-Word Contrastive (TWC) learning task, which pre-trains word representation by augmenting OCR tokens via the Levenshtein distance between the OCR tokens and words in a dictionary.
method: ssbaseline2020-09-09
Authors: Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu
Affiliation: Northwestern Polytechnical University
Email: zephyrzhuqi@gmail.com
Description: We wish this work to set the new baseline for these two OCR text related applications and to inspire new thinking of multi-modality encoder design.
method: TIG2020-08-15
Authors: Xiangpeng Li
Description: Text-Instance Graph: We build an OCR-Obj graph using overlapping relationships between OCR token texts and visual instances in the image. Then question conditioned multi-step graph attention network is adopted to extend the perception of each node, which makes the node is described by their neighboring nodes.
Date | Method | Score | |||
---|---|---|---|---|---|
2022-03-15 | TWA | 0.5774 | |||
2020-09-09 | ssbaseline | 0.5500 | |||
2020-08-15 | TIG | 0.5051 | |||
2019-11-02 | M4C (single model) | 0.4621 | |||
2019-04-30 | VTA | 0.2820 | |||
2019-04-30 | QAQ | 0.2563 | |||
2019-04-22 | Clova AI OCR | 0.2155 | |||
2019-04-29 | USTB-TQA | 0.1702 | |||
2019-04-29 | USTB-TVQA | 0.0952 | |||
2019-04-29 | Focus: A bottom-up approach for Scene Text VQA | 0.0882 |