method: Task3-re52019-04-30

Authors: Yumei Li, Jianwei Wu, Wenhao He (angelicohe@tencent.com), Tao Xue, Long Liu

Description: Our method is mainly based on the previous works [1] and [2]. The backbone network is based on VGG-16. When training the CNN model, we first adopt datasets of CASIA-10K [2], MLT-17 [3] and RCTW-17 [4] for pretraining, and then use the training data from the competition for fine-tuning. To improve the performance for long Chinese text lines, we also use the recognition information to achieve better line segments grouping results.

Organization: Tencent Map Big Data Lab Image Recognition Team

[1] He W, Zhang X Y, Yin F, et al. Deep direct regression for multi-oriented scene text detection[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 745-753.
[2] He W, Zhang X Y, Yin F, et al. Multi-oriented and multi-lingual scene text detection with direct regression[J]. IEEE Transactions on Image Processing, 2018, 27(11): 5406-5419.
[3] Nayef N, Yin F, Bizid I, et al. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 1454-1459.
[4] Shi B, Yao C, Liao M, et al. Icdar2017 competition on reading Chinese text in the wild (rctw-17)[C]. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 1429-1434.
[5] Gomez R, Shi B, Gomez L, et al. ICDAR2017 robust reading challenge on COCO-Text[C]. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2017, 1: 1435-1443.