Results - Text in Videos - Robust Reading Competition

method: Megvii-Image++2016-04-13

Authors: Jia Yu, Xinyu Zhou, Cong Yao, Jianan Wu, Chi Zhang, Shuchang Zhou

Description: The detection part is accomplished by a FCN which directly extracts text regions from original images. The tracker is a net flow based association algorithm. The recognition part is another neural network that performs whole word recognition.

method: Baseline-TextSpotter2015-03-30

Authors: Lukas Neumann, Jiri Matas, Michal Busta

Description: TextSpotter is used for frame-by-frame detection. The FoT tracker of Tomas Vojir et al is used for tracking.

TextSpotter is an unconstrained real-time end-to-end text localization and recognition method. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs). ERs are grouped into word regions which are recognized using an approximate nearest-neighbor classifier operating on a coarse Gaussian scale-space pyramid. A demo of the software is available online: http://www.textspotter.org

The FoT tracker [1] can be found here:
http://cmp.felk.cvut.cz/~vojirtom/

[1] Tomas Vojir and Jiri Matas, “The Enhanced Flock of Trackers“. Registration and Recognition in Images and Videos - Studies in Computational Intelligence, Springer 2014.

method: Stradvision-12015-04-17

Authors: H. Cho, M. Sung, and B. Jun

Description: First, we extract character candidates using extremal regions (ER) Second, we verify the extracted character candidates with the character classifier trained by Agile Learning. Afterwards, we do text-patch matching which greatly enhances the recall rate, and group the characters into text regions. Finally, we apply a deep neural network for character recognition. For tracking the text regions, we combined "detection by tracking" and "tracking by detection".

Ranking Table

Description Paper Source Code

Date	Method	MOTA	MOTP	IDF1
2016-04-13	Megvii-Image++	61.21%	64.95%	0.00%
2015-03-30	Baseline-TextSpotter	59.83%	69.51%	0.00%
2015-04-17	Stradvision-1	56.54%	69.21%	0.00%
2015-04-02	USTB_TexVideo II-2	50.52%	63.48%	0.00%
2015-04-02	USTB_TexVideo	45.82%	65.08%	0.00%
2015-04-02	Deep2Text I (Video)	35.39%	62.12%	0.00%
2015-04-02	USTB_TexVideo II-1	21.16%	60.46%	0.00%

Inactive evaluations

method: Megvii-Image++2016-04-13

method: Baseline-TextSpotter2015-03-30

method: Stradvision-12015-04-17

Ranking Table

Ranking Graphic