Results - ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

method: CLOVA OCR2019-04-22

Authors: Sungrae Park, Seung Shin, Seonghyeon Kim, Jaeheung Surh, Junyeop Lee, Hwalsuk Lee

Description: Our model consists of a ResNet-based backbone, a sequence model, and an attention-based decoder [1]. The backbone is a combination of the ResNet and SENet(squeeze and excitation network) [2] and the others are based on Baek et al. [1]. We trained the model with our own synthetic datasets by applying virtual adversarial training (VAT) techniques [3]. For this competition, we fine-tuned the model with the training dataset of SROIE. The recognition identified the texts on the detected text boxes by CRAFTS [4].

[1] Baek, Jeonghun, et al. "What is wrong with scene text recognition model comparisons? dataset and model analysis." arXiv preprint arXiv:1904.01906 (2019). [2] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. [3] Miyato, Takeru, et al. "Virtual adversarial training: a regularization method for supervised and semi-supervised learning." IEEE transactions on pattern analysis and machine intelligence (2018). [4] Baek, Youngmin, et al. "Character Region Awareness for Text Detection." arXiv preprint arXiv:1904.01941 (2019).

Source code

method: IFLYTEK-textRec_v42019-04-22

Authors: IFLYTEK

Description: Description: an attention-based text recognizer is designed as an encoder-decoder framework. In the encoding stage, an image is transformed into a sequence of feature vectors by CNN/LSTM, and each feature vector corresponds to a region in the input image. In the decoding stage, the attention model first computes alignment factors by referring to the history of target characters and the encoded feature vectors for generating the synthesis vectors. Then, a recurrent neural network (RNN) is used to generate the target characters based on the glimpse vectors and the history of target characters.

Cheng Z, Fan B, Xu Y, et al. Focusing Attention: Towards Accurate Text Recognition in Natural Images[J]. 2017.

Source code

method: CTPN_CRNN2019-04-18

Authors: caisiqi

Description: 先CTPN检测，CRNN识别字块图得到不含空格的文本序列，后处理添加序列中单词间的空格。
先CTPN检测，CRNN识别字块图得到不含空格的文本序列，后处理添加序列中单词间的空格。
先CTPN检测，CRNN识别字块图得到不含空格的文本序列，后处理添加序列中单词间的空格。

B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298–2304, 2017

Source code

Ranking Table

Description Paper Source Code

Date	Method	Recall	Precision	Hmean
2019-04-22	CLOVA OCR	94.30%	94.88%	94.59%
2019-04-22	IFLYTEK-textRec_v4	80.63%	81.72%	81.17%
2019-04-18	CTPN_CRNN	35.75%	63.89%	45.85%
2019-04-22	A Text Extraction Method Based on Modified CRNN	26.33%	72.53%	38.63%
2019-04-18	BiLSTM+ctc	28.75%	49.69%	36.42%
2019-04-16	VIL	0.00%	0.00%	0.00%

Inactive evaluations

method: CLOVA OCR2019-04-22

method: IFLYTEK-textRec_v42019-04-22

method: CTPN_CRNN2019-04-18

Ranking Table

Ranking Graphic