method: CLOVA OCR2019-04-22

Authors: Sungrae Park, Seung Shin, Seonghyeon Kim, Jaeheung Surh, Junyeop Lee, Hwalsuk Lee

Description: Our model consists of a ResNet-based backbone, a sequence model, and an attention-based decoder [1]. The backbone is a combination of the ResNet and SENet(squeeze and excitation network) [2] and the others are based on Baek et al. [1]. We trained the model with our own synthetic datasets by applying virtual adversarial training (VAT) techniques [3]. For this competition, we fine-tuned the model with the training dataset of SROIE. The recognition identified the texts on the detected text boxes by CRAFTS [4].

method: IFLYTEK-textRec_v42019-04-22

Authors: IFLYTEK

Description: Description: an attention-based text recognizer is designed as an encoder-decoder framework. In the encoding stage, an image is transformed into a sequence of feature vectors by CNN/LSTM, and each feature vector corresponds to a region in the input image. In the decoding stage, the attention model first computes alignment factors by referring to the history of target characters and the encoded feature vectors for generating the synthesis vectors. Then, a recurrent neural network (RNN) is used to generate the target characters based on the glimpse vectors and the history of target characters.

Authors: Songyi Yang, Shengjie Xiu, Niansong Zhang

Description: We apply a modified CRNN in this task. CRNN is a conventional scene text recognition method including convolutional layers, bidirectional LSTM layers, and a transcription layer in sequence. However, in scanned receipts each text usually contains several words. We add the blank space between words to the alphabet for LSTM prediction and thus improve the network from single word recognition to multiple words recognition. Moreover, we double the input image width to tackle the overlap problem of long texts after max-pooling and stack one more LSTM, enhancing the accuracy per character in the training set from 62% to 83%.

Ranking Table

Description Paper Source Code
2019-04-22CLOVA OCR94.30%94.88%94.59%
2019-04-22A Text Extraction Method Based on Modified CRNN26.33%72.53%38.63%

Ranking Graphic