Overview - Out of Vocabulary Scene Text Understanding

The ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding (OOV-ST) aims to evaluate the ability of text extraction models to deal with out-of-vocabulary (OOV) words.

By OOV word we refer to text instances that have NEVER been seen in the training sets of the most common Scene Text understanding datasets to date. The focus in this first edition of the competition is on text instances in which characters come from a limited alphabet. The alphabet is formed by alphanumeric Latin characters as well as the most common non-alphanumeric characters.

 

STVQA_Overview_1a.jpg

STVQA_Overview_1b.jpg

mlt_2.jpg

(a) (b) (c)

Figure 1. Some examples for out of vocabulary words. (a) "8161", (b) "TARONGA" , (c) "DYY".
[Example (a) is from COCO-Text [2], (b)-(c) are from MLT-19 [7]].

 

OOV words are quite common in real-world scenarios and typically convey important information about the scene (prices, dates, toponyms, business names, URLs, etc). As such they are of great importance in terms of research and application. However, existing benchmarks do not specifically measure performance over OOV instances, resulting in models that rely excessively on their (implicit or explicit) language model. One of the limitations of current OCR systems is that they perform well on images with words within vocabulary but generalize poorly to out-of-vocabulary text instances [1]. Hence, addapting current OCR systems to recognize OOV instances is a crucial next step forward.

To participate in this competition, interested authors can use any combination of the commonly used standard scene-text datasets: COCO-Text [2], ICDAR 2015 [3], HierText [4], TextOCR [5], MLT-19 [7], OpenTextImages [6]. In addition, participants are free to generate synthetic data, using a provided dictionary of the 90k most frequent English words [8]. At the evaluation phase, a test set will be provided that contains images with text instances that never occur in the combined training set of the above-mentioned datasets.

For more details on how to obtain the dataset, submission format, and evaluation criteria, please refer to the Tasks page.

We expect this challenge to increase interest in techniques that balance vision and language trade-offs.

The OOV-ST Challenge is organized in the context of the Text in Everything (TiE) Workshop that will take place in Tel Aviv in October 2022. A special session will be hosted on this challenge, to which top-performing methods will be invited to present their findings and insights.

References

[1] Wan, Zhaoyi, et al. "On vocabulary reliance in scene text recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

[2] A. Veit, T. Matera, L. Neumann, J. Matas, S. Belongie. “COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images”. arXiv preprint arXiv:1601.07140 (2016).

[3] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, D. Ghosh , A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, VR. Chandrasekhar, A. Lu, F. Shafait, S. Uchida, E. Valveny. “ICDAR 2015 robust reading competition”. ICDAR (2015).

[4] Long, Shangbang, et al. "Towards End-to-End Unified Scene Text Detection and Layout Analysis." arXiv preprint arXiv:2203.15143 (2022).

[5] Singh, Amanpreet, et al. "TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[6] Krylov, Ilya, Sergei Nosov, and Vladislav Sovrasov. "Open Images V5 Text Annotation and Yet Another Mask Text Spotter." Asian Conference on Machine Learning. PMLR, 2021.

[7] Nayef, Nibal, et al. "ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019." 2019 International conference on document analysis and recognition (ICDAR). IEEE, 2019.

[8] Jaderberg, Max, Karen Simonyan and Andrea Vedaldi. "Synthetic data and artificial neural networks for natural scene text recognition." arXiv preprint arXiv:1406.2227 (2014).

 

Important Dates

11 May 2022: Web site online

15 June 2022: Test set available

15 July 2022: Submission of results deadline

20 July 2022: Announcement of results winners

October 2022: Results presentation at the TiE Workshop @ ECCV 2022