method: SituTech_OCR2021-03-11

Authors: Kui Lyu, Chuanhe Liu

Affiliation: Beijing Situ Vision Technologies Co. Ltd


Description: In this work, we design an elegant text detection model. Our detector is similar to DBNet, but there are some difference. More specifically, we have introduced an advanced detector backbone, a classic network EfficientDet, with flexible scales and stronger ability to extract features. Another breakthrough is that we optimized the label generation strategy in DBNet. In the original work, the positive area generation and the expansion of the positive area to the bounding box used the Vatti clipping algorithm, which is less robust with different area perimeter ratios. We optimized this function to make the label transform between positive area and bounding box more reasonable.

If you have any questions, please contact us.
SituAIgorithm Team, Beijing Situ Vision Technologies Co. Ltd

method: CPN (multi-scale)2024-05-30

Authors: Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong


Description: We propose a Complementary Proposal Network (CPN) that seamlessly and parallelly integrates semantic and geometric information for superior performance. This Result is achieved with single Swin-L backbone and multi-scale testing policy. No model ensemble is used.

method: CRAFTS2019-06-03

Authors: Youngmin Baek, Seung Shin, Jeonghun Baek, Bado Lee, Chae Young Lee, and Hwalsuk Lee

Description: We propose a novel end-to-end text detection and recognition method called CRAFTS (Character Region Awareness For Text Spotting). CRAFTS is an end-to-end trainable network capable of detecting and recognizing multiple languages. The detection branch estimates the position and the orientation of the texts in the input image. The recognition is conducted with an attention-based decoder, utilizing the pooled text area features from the detection branch. The script identification is performed by identifying the most frequent language occurrences of the characters in the text. The text detector effectively detects text area by exploring each of the character regions and the affinities between the characters. To overcome the lack of individual character level annotations, our detection framework exploits the pseudo character-level bounding boxes in a weakly-supervised manner. The pseudo character-level bounding boxes are acquired by inferencing the learned interim model.

Clova AI OCR Team, NAVER/LINE Corp.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision
2024-05-30CPN (multi-scale)53.53%37.13%95.88%93.85%
2019-05-31A two-stage text detector based on cascade rcnn47.54%31.76%94.51%91.13%
2019-05-26two stage text detector46.64%30.99%94.20%89.16%
2019-06-02A two-stage text detector based on cascade rcnn(using total 10000 images of mlt19)46.16%30.53%94.58%90.87%

Ranking Graphic

Ranking Graphic