method: AlimamaCV2016-05-13

Authors: Quan Chen, Tiezheng Ge, Zhiqiang Zhang, Minghui Li, Kun Gai

Description: We approach this task with a combination of three deep neural networks and a language model. Specifically , a LSTM model is used to accomplish word recognition based on the features generated by a CNN model. The final words are decoded by a bi-gram language model and their locations are refined by a location regression network. Two internal text corpora are involved in the training procedure. For "strongly" and "weakly" version, the given corresponding vocabulary is simply used as the final output filter.

method: Megvii-Image++2016-01-30

Authors: Jia Yu, Xinyu Zhou, Cong Yao, Jianan Wu, Chi Zhang, Shuchang Zhou

Description: A deep neural network based system that consists of two main parts: text detection and word recognition. The detection part is accomplished by a FCN which directly extracts text regions from original images. The recognition part is another neural network that performs whole word recognition.

method: Deep2Text II+2015-10-20

Authors: Xu-Cheng Yin, Chun Yang, Jie-Bo Hou, Wei-Yi Pei, Zong-Heng Xing

Description: Text is first detected with USTB_TexStar+ (USTB_TexStar [1] with multiple channels). Then a hybrid word recogniton system (a CNN recognizer [2] and an OCR engine) is performed. Finally, the levenshtein distance is used to find the nearest word in the given vocabulary in order to rectify some false recognition results.

[1] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, “Robust text detection in natural scene images”, IEEE Trans. Pattern Analysis and Machine Intelligence, 36(5): 970-983, 2014.
[2] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, "Reading Text in the Wild with Convolutional Neural Networks", arXiv preprint arXiv:1412.1842, 2014.

Ranking Table

Description Paper Source Code
DateMethodRecallPrecisionHmean
2016-05-13AlimamaCV79.49%94.93%86.53%
2016-01-30Megvii-Image++79.00%90.59%84.40%
2015-10-20Deep2Text II+73.78%89.16%80.75%
2015-04-02Deep2Text II-173.37%80.97%76.98%
2015-04-02Stradvision-270.86%77.61%74.08%
2015-04-02StradVision-167.87%78.90%72.97%
2015-03-30Deep2Text I61.40%83.46%70.75%
2015-05-04Baseline OpenCV 3.0 + Tesseract32.82%47.20%38.72%

Ranking Graphic