method: Lenovo-MI-Lab OCR2020-07-09

Authors: Lenovo-MI-Lab

Description: The recognition model consists of an improved ResNet-50 as feature extractor and a two-layer Bi-LSTM to generate sequence output from extracted features. Besides, 1D attention module is employed to focus on the feature corresponding to the text region. The model is trained using the following public available datasets: ICDAR2017 RCTW, ICDAR2019 ArT, ICDAR2019 LSVT, ICDAR2019 MLT, ICDAR2019 ReCTs, ICDAR2017 COCO-Text, CTW, CurvedSynthText. In testing stage, for images with height larger than width, we first predict text from three input images (the original image, the images rotated with 90 degree and -90 degree), then the predicted text with the highest confidence is used as the final result.