method: CRAFT + TPS-ResNet v32019-04-30

Authors: Youngmin Baek, Chae Young Lee, Jeonghun Baek, Moonbin Yim, Sungrae Park, and Hwalsuk Lee

Description: [Detection part]
We propose a novel text detector called CRAFT. The proposed method effectively detects text area by exploring each character and affinity between characters. To overcome the lack of individual character level annotations, our framework exploits the pseudo character-level bounding boxes acquired by the learned interim model in a weakly-supervised manner.
[Recognition part]
We used Thin-plate-spline (TPS) based Spatial transformer network (STN) which normalizes the input text images, ResNet based feature extractor, BiLSTM, and attention mechanism.
This model was developed based on the analysis of scene text recognition modules.
See our paper and source code.
# CRAFT + TPS-ResNet v3 (test with small img size, and use all train data for recognition model)

Training Data
[Detection part]
We pre-trained our model CRAFT with SynthText, ICDAR 2013 FST, ICDAR 2017 MLT and finetuned it with some of the publicly released datasets of this year’s ICDAR challenge: ArT, MLT, and ReCTS.
[Recognition part]
At first, we generated the Chinese synthetic datasets by MJSynth and SynthText code, then pre-trained our model with the synthetic dataset and real dataset (ArT, LSVT, ReCTS, and RCTW). After that, we finetuned it with ReCTS data.