method: MCEM v32019-04-30

Authors: USTC-iFLYTEK

Description: We propose a two-stage deep learning based scene text spotting approach. For the first stage, we used ensembles to detect arbitrary shape text lines in images; For the second stage, the text lines are clipped out by controlled points and fed to the scene text recognition network which is an ordinary attention based encoder decoder architecture. Our training set is the union of CTW, RCTW, RECTS, ART, LSVT datasets. All the training datasets have been publicly released. Specifically, pseudo labels are generated for partially labelled LSVT images. Features generated by different models are fused and samples with low confidence are discarded to improve the final performance.
name organization
Xiangxiang Wang(王翔翔) iFLYTEK(科大讯飞)
Shuai Shao(邵帅) iFLYTEK(科大讯飞)
Hao Wu(吴浩) iFLYTEK(科大讯飞)
Chenyu Liu(刘辰宇) iFLYTEK(科大讯飞)
Yixing Zhu(朱意星) USTC(中国科技大学)
Zhengyan Yang(杨争艳) iFLYTEK(科大讯飞)
Changjie Wu(吴昌杰) USTC(中国科技大学)
Mobai Xue(薛莫白) USTC(中国科技大学)
Jiajia Wu(吴嘉嘉) iFLYTEK(科大讯飞)
Bing Yin(殷兵) iFLYTEK(科大讯飞)
Cong Liu(刘聪) iFLYTEK(科大讯飞)
Jinshui Hu(胡金水) iFLYTEK(科大讯飞)
Jun Du(杜俊) USTC(中国科技大学)
Jianshu Zhang(张建树) USTC(中国科技大学)
Lirong Dai(戴礼荣) USTC(中国科技大学)