method: CRAFTS2019-06-03

Authors: Youngmin Baek, Seung Shin, Jeonghun Baek, Bado Lee, Chae Young Lee, and Hwalsuk Lee

Description: We propose a novel end-to-end text detection and recognition method called CRAFTS (Character Region Awareness For Text Spotting). CRAFTS is an end-to-end trainable network capable of detecting and recognizing multiple languages. The detection branch estimates the position and the orientation of the texts in the input image. The recognition is conducted with an attention-based decoder, utilizing the pooled text area features from the detection branch. The script identification is performed by identifying the most frequent language occurrences of the characters in the text. The text detector effectively detects text area by exploring each of the character regions and the affinities between the characters. To overcome the lack of individual character level annotations, our detection framework exploits the pseudo character-level bounding boxes in a weakly-supervised manner. The pseudo character-level bounding boxes are acquired by inferencing the learned interim model.

Clova AI OCR Team, NAVER/LINE Corp.

method: CRAFTS(Initial)2019-05-28

Authors: Youngmin Baek, Seung Shin, Jeonghun Baek, Bado Lee, Chae Young Lee, and Hwalsuk Lee

Description: We propose a novel end-to-end text detection and recognition method called CRAFTS (Character Region Awareness For Text Spotting). CRAFTS is an end-to-end trainable network capable of detecting and recognizing multiple languages. The detection branch estimates the position and the orientation of the texts in the input image. The recognition is conducted with an attention-based decoder, utilizing the pooled text area features from the detection branch. The script identification is performed by identifying the most frequent language occurrences of the characters in the text. The text detector effectively detects text area by exploring each of the character regions and the affinities between the characters. To overcome the lack of individual character level annotations, our detection framework exploits the pseudo character-level bounding boxes in a weakly-supervised manner. The pseudo character-level bounding boxes are acquired by inferencing the learned interim model.

Clova AI OCR Team, NAVER/LINE Corp.

method: E2E-MLT2019-05-22

Authors: Yash Patel, Michal Busta, Jiri Matas

Description: An end-to-end trainable (fully differentiable) method for multi-language scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks.
E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision1-NED1-NED (Case Sens.)Hmean (Case Sens.)
2019-06-03CRAFTS51.74%65.68%42.68%34.95%48.27%47.75%50.74%
2019-05-28CRAFTS(Initial)46.99%66.21%36.41%30.54%42.52%42.01%45.97%
2019-05-22E2E-MLT26.46%37.44%20.47%7.72%26.39%25.71%24.85%

Ranking Graphic

Ranking Graphic