method: SANHL_v12019-04-30

Authors: In Description

Description: In this task, we first detect possible text lines. Then an ensembled recognition model is used to predict strings. The result is submitted by the researchers from South China University of Technology, Northwestern Polytechnical University, The University of Adelaide, Lenovo and Huawei. The researchers are Canjie Luo*, Yuliang Liu*(equal contribution), Qingxiang Lin, Hao Chen, Tianwei Wang, Lele Xie, Lu Yang, Shuaitao Zhang, Linjiang Zhang, Tong He, Canyu Xie, Chongyu Liu, Xiaoxue Chen, Jiapeng Wang, Xiangle Chen, Dezhi Peng, Weihong Ma, Peng Wang, Hui Li, Lianwen Jin, Chunhua Shen, Yaqiang Wu and Liangwei Wang.

单位:华南理工大学,阿德莱德大学,西北工业大学,联想,华为。
作者:罗灿杰*,刘禹良*,林庆祥,陈昊,王天玮,谢乐乐,杨路,张帅涛,张林江,贺通,谢灿宇,刘崇宇,陈晓雪,汪嘉鹏,陈向乐,彭德智,马伟洪,王鹏,李晖,金连文,沈春华,武亚强,王靓伟。

method: CRAFT + TPS-ResNet v32019-04-30

Authors: Youngmin Baek, Chae Young Lee, Jeonghun Baek, Moonbin Yim, Sungrae Park, and Hwalsuk Lee

Description: [Detection part]
We propose a novel text detector called CRAFT. The proposed method effectively detects text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our framework exploits the pseudo character-level bounding boxes acquired by the learned interim model in a weakly-supervised manner.
[Recognition part]
We used Thin-plate-spline (TPS) based Spatial transformer network (STN) which normalizes the input text images, ResNet based feature extractor, BiLSTM, and attention mechanism.
This model was developed based on the analysis of scene text recognition modules.
See our paper and source code.
# CRAFT + TPS-ResNet v3 (test with small img size, and use all train data for recognition model)

Training Data
[Detection part]
We pre-trained our model CRAFT with SynthText, ICDAR 2013 FST, ICDAR 2017 MLT and finetuned it with some of the publicly released datasets of this year’s ICDAR challenge: ArT, MLT, and ReCTS.
[Recognition part]
At first, we generated the Chinese synthetic datasets by MJSynth and SynthText code, then pre-trained our model with the synthetic dataset and real dataset (ArT, LSVT, ReCTS, and RCTW). After that, we finetuned it with ReCTS data.

Ranking Table

Description Paper Source Code
DateMethodRecallPrecisionHmean1-NED
2019-04-30SANHL_v193.86%91.98%92.91%81.43%
2021-09-20ABCNetv287.91%92.89%90.33%63.94%
2019-04-30CRAFT + TPS-ResNet v375.89%78.44%77.14%41.68%

Ranking Graphic

Ranking Graphic