Overview - ICDAR2017 Competition on Multi-lingual scene text detection and script identification
|RRC-MLT Call for Participation RRC-MLT-2017-CFP1.pdf|
Text detection and recognition in a natural environment is a key component of many applications, ranging from business card digitization to shop indexation in a street. This new competition aims at assessing the ability of state of the art methods to detect text where the user is facing various scripts and languages in a way which prevent using much a priori knowledge, as in modern cities where multiple cultures live and communicate together. This situation is also frequent when analyzing streams of contents gathered on the Internet. This competition therefore is an extension of the existing Robust Reading Competition (RRC) which has been held since 2003 both in ICDAR and in an online context.
To register in this MLT-challenge of the RRC competition 2017, please:
- Send an email to firstname.lastname@example.org or email@example.com with the title "Participation in the RRC-MLT challenge"
- Register to the RRC portal as a user (like previous RRC editions), this will allow you to access the "downloads"
This does not oblige you to participate or submit results, it is an expression of interest.
You can participate in one or more tasks of the challenge. It is not obligatory to participate in all the tasks.
Motivation and relevance to ICDAR community
In this proposed competition we try to answer the question whether text detection methods (whether deep learning-based or otherwise) could handle different scripts without fundamental changes in the used algorithms/techniques, or do we really need script-specific methods ?. The ultimate goal of robust reading is be able to read the text which appears in any captured image despite image source (type), image quality, text script or any other difficulties. Many research works have been devoted to solve this problem. The previous editions of RRC competitions and other works, have provided useful datasets to help researchers tackle each of those problems in order to robustly read text in natural scene images. In this competition, we extend state-of-the-art work further by tackling the problem of multi- lingual text detection and script identification. In other words, methods should be script-robust text detection methods.
Despite the available datasets related to scene text detection or to script identification, our proposed dataset offers interesting novel aspects. The dataset is composed of complete scene images which come from 9 languages representing 6 different scripts. It combines text detection with script identification, and contains much more images than related datasets. The number of images per script is equal. This makes it a useful benchmark for the task of multi-lingual scene text detection. The dataset along with its ground truth contains all necessary information to prepare for text recognition systems as well. The considered languages are the following: Chinese, Japanese, Korean, English, French, Arabic, Italian, German and Indian.
Such dataset is the natural extension of the RRC series, with more scripts and more images while only focusing on intentional (or focused) text. It addresses the needs of the community for improved and robust scene text detection. Datasets following this idea are being created because they are needed by industry and regular users. However, such datasets -- we argue -- cannot be considered for benchmarking multi-script scene text detection.
The target audience of this dataset is obviously not only the ICDAR community, but also the computer vision community. In both communities, researchers work on analyzing scenes, scene text detection and recognition, quality of text images and script identification.
The datasets available in the literature for scene text detection are mostly not multilingual. The datasets which contain multi-script text are either built for Indian scripts only, or they contain a small number of scripts (2 - 4) with a relatively small number of images. On the other hand, datasets that have been created for the tasks of script identification (classification) are composed of cropped text word images.
As we focus on the multi-script and multi-lingual aspects of scene text, we list in the following table all the publicly available datasets related to multi-script text detection and to script/language identification. Note that we do not list -- for example -- the well-known RRC dataset for scene text detection because it is only for English text (single script).