method: TextFuseNet2020-07-31

Authors: Jian Ye, Zhe Chen, Juhua Liu and Bo Du

Affiliation: Wuhan University, The University of Sydney

Email: liujuhua@whu.edu.cn

Description: Arbitrary shape text detection in natural scenes is an extremely challenging task. Unlike existing text detection approaches that only perceive texts based on limited feature representations, we propose a novel framework, namely TextFuseNet, to exploit the use of richer features fused for text detection. More specifically, we propose to perceive texts from three levels of feature representations, i.e., character-, word- and global-level, and then introduce a novel text representation fusion technique to help achieve robust arbitrary text detection. The multi-level feature representation can adequately describe texts by dissecting them into individual characters while still maintaining their general semantics. TextFuseNet then collects and merges the texts’ features from different levels using a multi-path fusion architecture which can effectively align and fuse different representations. In practice, our proposed TextFuseNet can learn a more adequate description of arbitrary shapes texts, suppressing false positives and producing more accurate detection results. Our proposed framework can also be trained with weak supervision for those datasets that lack character-level annotations. Experiments on several datasets show that the proposed TextFuseNet achieves state-of-the-art performance. Specifically, we achieve an F-measure of 94.3% on ICDAR2013, 92.1% on ICDAR2015,87.1% on Total-Text and 86.6% on CTW-1500, respectively.

Authors: Wenhai Wang, Xiang Li, Wenbo Hou, Tong Lu, Jian Yang

Description: A text detector based on semantic segmentation. Using only ICDAR_2017 MLT training set and ICDAR 2015 training set. Paper is in the preparation. And we will release our code latter.

method: PixelLink2017-09-13

Authors: Dan Deng

Description: PixelLink: Detecting Scene Text via Instance Segmentation

Accepted by AAAI2018

ABSTRCT:
Most state-of-the-art scene text detection algorithms are deep learning based methods that depend on bounding box regression and perform at least two kinds of predictions: text/non-text classification and location regression. Regression plays a key role in the acquisition of bounding boxes in these methods, but it is not indispensable, because text/non-text prediction can also be considered as a kind of semantic segmentation that contains full location information in itself. However, text instances in scene images often lie very close to each other, making them very difficult to separate via semantic segmentation. Therefore, instance segmentation is needed to address this problem. In this paper, PixelLink, a novel scene text detection algorithm based on instance segmentation, is proposed. Text instances are first segmented out by linking pixels within the same instance together. Text bounding boxes are then extracted directly from the segmentation result without location regression. Experiments show that, compared with regression based methods, PixelLink can achieve better or comparable performance on several benchmarks, while requiring much fewer training iterations and less training data.

Using only the 1,000 images in IC15-train, the best performance is 83.7%; when SynthText is added for pretraining, it is 85%

Ranking Table

Description Paper Source Code
DateMethodRecallPrecisionHmean
2020-07-31TextFuseNet90.56%93.96%92.23%
2018-05-18PSENet_NJU_ImagineLab (single-scale)85.22%89.30%87.21%
2017-09-13PixelLink83.77%86.65%85.19%
2018-01-04crpn80.69%88.77%84.54%
2019-07-15stela78.57%88.70%83.33%
2019-04-10EAST-VGG1681.27%84.36%82.79%
2020-08-14DAL(multi-scale)80.45%84.35%82.36%
2020-08-13DAL79.49%83.68%81.53%
2017-07-31EAST reimplemention with resnet 5077.32%84.66%80.83%
2017-01-23RRPN-477.13%83.52%80.20%
2016-10-28RRPN-373.23%82.17%77.44%
2019-07-23std++(single-scale)56.67%71.64%63.28%

Ranking Graphic