Overview - Text in Videos

The aim of this challenge is to provide the impetus for the development of methods that take advantage of video sequences to localize and recognize text in the depicted scene.

Most existing text detection methods focus exclusively on the detection of text in static images. Methodologies that are created for static images do not necessarily work well in the video domain, while at the same time they do not take advantage of the extra information present in the video (e.g. tracking already detected regions).

Videos present a set of different challenges. To mention but a few, the quality of the image is generally worse than static images, due to motion blur and out of focus issues, while video compression might create further artefacts. Needless to mention that for video based applications algorithms with real-time responses are needed, an aspect that is nevertheless not examined in the 2013 competition.

The challenge will be based on various short sequences (10 seconds to 1 minute long), selected so that they represent a wide range of real-life situations, using different types of cameras. The dataset covers different scripts and languages (Spanish, French, English). Localisation ground truth will be provided at the word level for each frame.

Challenge3_Overview1.png

Challenge News

Important Dates