Authors: Pengfei Wang~*, Mengyi En*, Xiaoqiang Zhang*, Chengquan Zhang*
Affiliation: VIS-VAR Team, Baidu Inc.*; Xidian University~
Description: The method mainly relies on a two-stage text detector, namely LOMO , which is inspired by Mask-R-CNN and where an iterative refinement module is introduced to refine the boundary of text region once or more times during testing to get the more accurate detection results. As extra data sets, ICDAR15 and partial KAIST are also used in the training phase. Multi-scale testing is adopted and the final result is boosted from LOMOs with Resnet-50 and Inception-v4 as different backbones.
*This work is done when Pengfei Wang is an intern at Baidu Inc.