Results - ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

method: BOE_AIoT_CTO2020-08-10

Authors: Guangwei Huang, Yue Li, Xiaojun Tang

Description: Our model trained PANNet and multi oriented corner text detectors, and ensemble multi-scale images detection results. Besides, we pre-process the training and testing images to make them clear. Multi-scale training, training data augment are used.

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation

Source code

Source code 2

method: H&H Lab2019-04-22

Authors: HUST_VLRGROUP(Mengde Xu, Zhen Zhu,Hui Zhang, Mingkun Yang, Jiehua Yang) & HUAWEI_CLOUD_EI(Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu)

Description: we ensemble EAST and multi oriented corner to create a robust scene text detector. To make network learning easier, we modified the mutli-oriented corner network with a new branch borrowed from east added.

Lyu, Pengyuan, et al. "Multi-oriented scene text detection via corner localization and region segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

Zhou, Xinyu, et al. "EAST: an efficient and accurate scene text detector." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017.

Source code

Source code 2

method: A modified CTPN model 2.02022-05-09

Authors: Njoyim Tchoubith Peguy Calusha

Affiliation: University of Fribourg, Switzerland

Email: pegpeg07@hotmail.com

Description: The novel Connectionist Text Proposal Network (CTPN) published by Tian, Zhi, et al. develops a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, improving localization accuracy. Originally created to tackle the scene text detection (ICDAR 2013 & 2015), the following enhancement has been made to tackle the scanned receipt text localization:

- In the original CTPN architecture, there are not any interactions between the localization and confidence layers. The output feature map of the localization layer has been incorporate into the computation of the confidence layer, making it focus more on meaningful regions.

- Due to high positive and negative Jaccard overlap (0.7 and 0.5 respectively), the anchor matching strategy fails to match each and every ground truth boxes. Thus the average number of matched anchors are low. To fix this, the positive Jaccard overlap is decreased from 0.7 to 0.5 and from 0.5 to 0.3 for negative Jaccard overlap.

- The regression loss used in the CTPN is the smooth L1 loss. Altough it is a good loss, it is not free from outliers. That is why the balanced L1 loss was used.

- Because of the imbalance between the number of positive and negative anchors, λ1 from the regression loss is set to 4 to balance the loss terms.

- The number of channels of the RPN layer (the one that slides through the last convolutional maps conv5 of the VGG16 model) is 256 instead of 512. This helps in setting large image size during training and localize texts well.

- The negative and positive ratio was changed from 1:1 to 3:1. It was found that this leads to faster optimization and a more stable training.

As most of the scanned receipts contains dominant white space which makes it difficult to localize text properly, the following crop preprocessing has been made:

1) Otsu's binarization (by using Sobel gradient)
2) Morphological operations (Structuring elements, MorphologyEx, Dilate, Erode)
3) Contour following

In addition to the normal post-processing (non max-suppression), the empty boxes are removed based on the average white pixel intensity.

Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao "Detecting Text in Natural Image with Connectionist Text Proposal Network" (2016).

Source code

Ranking Table

Description Paper Source Code

Date	Method	Recall	Precision	Hmean
2020-08-10	BOE_AIoT_CTO	98.76%	98.92%	98.84%
2019-04-22	H&H Lab	97.93%	97.95%	97.94%
2022-05-09	A modified CTPN model 2.0	97.52%	97.40%	97.46%
2021-10-22	A modified CTPN model 1.0	97.16%	97.10%	97.13%
2020-09-27	only PAN	96.51%	96.80%	96.66%
2021-01-28	58CV	97.48%	95.43%	96.45%
2019-04-22	GREAT-OCR Shanghai University	96.62%	96.21%	96.42%
2019-04-21	IFLYTEK-textDet_v3	93.77%	95.89%	94.81%
2019-04-22	A Single-Shot Model for Robust Text Localization	93.93%	94.80%	94.37%
2019-04-19	BiLSTM Based on CTPN	91.40%	94.03%	92.69%
2019-04-17	EAST_clip_enhance_896_giou	89.69%	93.77%	91.68%
2019-04-17	Textline detection	89.85%	92.72%	91.26%
2019-04-20	A Text Localization Method Based on CTPN	85.23%	88.73%	86.94%
2019-04-16	Vsdnu	85.07%	87.17%	86.11%
2021-05-10	Original CRAFT for SROIE	62.73%	59.94%	61.31%
2019-04-17	scene text detection weapon	49.61%	64.75%	56.18%
2021-04-13	Practicing project for Scientific Research Subject (HCMUS master program)	37.02%	30.07%	33.19%

Inactive evaluations

method: BOE_AIoT_CTO2020-08-10

method: H&H Lab2019-04-22

method: A modified CTPN model 2.02022-05-09

Ranking Table

Ranking Graphic