method: Task4-re32019-04-30

Authors: Yumei Li, Jianwei Wu, Wenhao He (angelicohe@tencent.com), Tao Xue, Long Liu

Description: Our method contains two parts: detection and recognition. Our detection method is mainly based on the previous works [1] and [2]. The backbone network is based on VGG-16. When training the CNN model, we first adopt datasets of CASIA-10K [2], MLT-17 [3] and RCTW-17 [4] for pretraining, and then use the training data from the competition for fine-tuning. To improve the performance for long Chinese text lines, we also use the recognition information to achieve better line segments grouping results. Our recognition method is based on the combination results from CNN based model [6] and LSTM-attention based model [7]. The training data for the recognition model is from CASIA-10K, MLT-17, RCTW-17 and synthetic data, together with the training samples from this competition.

Organization: Tencent Map Big Data Lab Image Recognition Team

Reference

[1] He W, Zhang X Y, Yin F, et al. Deep direct regression for multi-oriented scene text detection[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 745-753.
[2] He W, Zhang X Y, Yin F, et al. Multi-oriented and multi-lingual scene text detection with direct regression[J]. IEEE Transactions on Image Processing, 2018, 27(11): 5406-5419.
[3] Nayef N, Yin F, Bizid I, et al. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 1454-1459.
[4] Shi B, Yao C, Liao M, et al. Icdar2017 competition on reading Chinese text in the wild (rctw-17)[C]. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 1429-1434.
[5] Gomez R, Shi B, Gomez L, et al. ICDAR2017 robust reading challenge on COCO-Text[C]. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 1435-1443.
[6] Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, Cheng-Lin Liu. Scene Text Recognition with Sliding Convolutional Character Models. arXiv preprint arXiv:1709.01727, 2017 http://arxiv.org/abs/1709.01727
[7] Wojna Z, Gorban A N, Lee D S, et al. Attention-based extraction of structured information from street view imagery[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 844-850.