method: BOE_AIoT_CTO2020-08-10
Authors: Guangwei Huang, Yue Li, Xiaojun Tang
Description: Our model trained PANNet and multi oriented corner text detectors, and ensemble multi-scale images detection results. Besides, we pre-process the training and testing images to make them clear. Multi-scale training, training data augment are used.
method: H&H Lab2019-04-22
Authors: HUST_VLRGROUP(Mengde Xu, Zhen Zhu,Hui Zhang, Mingkun Yang, Jiehua Yang) & HUAWEI_CLOUD_EI(Jing Wang, Yibin Ye, Shenggao Zhu, Dandan Tu)
Description: we ensemble EAST and multi oriented corner to create a robust scene text detector. To make network learning easier, we modified the mutli-oriented corner network with a new branch borrowed from east added.
method: A modified CTPN model 2.02022-05-09
Authors: Njoyim Tchoubith Peguy Calusha
Affiliation: University of Fribourg, Switzerland
Email: pegpeg07@hotmail.com
Description: The novel Connectionist Text Proposal Network (CTPN) published by Tian, Zhi, et al. develops a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, improving localization accuracy. Originally created to tackle the scene text detection (ICDAR 2013 & 2015), the following enhancement has been made to tackle the scanned receipt text localization:
- In the original CTPN architecture, there are not any interactions between the localization and confidence layers. The output feature map of the localization layer has been incorporate into the computation of the confidence layer, making it focus more on meaningful regions.
- Due to high positive and negative Jaccard overlap (0.7 and 0.5 respectively), the anchor matching strategy fails to match each and every ground truth boxes. Thus the average number of matched anchors are low. To fix this, the positive Jaccard overlap is decreased from 0.7 to 0.5 and from 0.5 to 0.3 for negative Jaccard overlap.
- The regression loss used in the CTPN is the smooth L1 loss. Altough it is a good loss, it is not free from outliers. That is why the balanced L1 loss was used.
- Because of the imbalance between the number of positive and negative anchors, λ1 from the regression loss is set to 4 to balance the loss terms.
- The number of channels of the RPN layer (the one that slides through the last convolutional maps conv5 of the VGG16 model) is 256 instead of 512. This helps in setting large image size during training and localize texts well.
- The negative and positive ratio was changed from 1:1 to 3:1. It was found that this leads to faster optimization and a more stable training.
As most of the scanned receipts contains dominant white space which makes it difficult to localize text properly, the following crop preprocessing has been made:
1) Otsu's binarization (by using Sobel gradient)
2) Morphological operations (Structuring elements, MorphologyEx, Dilate, Erode)
3) Contour following
In addition to the normal post-processing (non max-suppression), the empty boxes are removed based on the average white pixel intensity.
Date | Method | Recall | Precision | Hmean | |||
---|---|---|---|---|---|---|---|
2020-08-10 | BOE_AIoT_CTO | 98.76% | 98.92% | 98.84% | |||
2019-04-22 | H&H Lab | 97.93% | 97.95% | 97.94% | |||
2022-05-09 | A modified CTPN model 2.0 | 97.52% | 97.40% | 97.46% | |||
2021-10-22 | A modified CTPN model 1.0 | 97.16% | 97.10% | 97.13% | |||
2019-04-22 | GREAT-OCR Shanghai University | 96.62% | 96.21% | 96.42% | |||
2019-04-21 | IFLYTEK-textDet_v3 | 93.77% | 95.89% | 94.81% | |||
2019-04-22 | A Single-Shot Model for Robust Text Localization | 93.93% | 94.80% | 94.37% | |||
2019-04-19 | BiLSTM Based on CTPN | 91.40% | 94.03% | 92.69% | |||
2019-04-17 | EAST_clip_enhance_896_giou | 89.69% | 93.77% | 91.68% | |||
2019-04-17 | Textline detection | 89.85% | 92.72% | 91.26% | |||
2019-04-20 | A Text Localization Method Based on CTPN | 85.23% | 88.73% | 86.94% | |||
2019-04-16 | Vsdnu | 85.07% | 87.17% | 86.11% | |||
2019-04-17 | scene text detection weapon | 49.61% | 64.75% | 56.18% |