Results - Out of Vocabulary Scene Text Understanding

method: DB_threshold2_TRBA_CocoValid2022-07-20

Authors: Yoonsik Kim, Taeho Kil, Seonghyeon Kim, Sukmin Seo

Affiliation: Clova AI OCR Team, NAVER/LINE Corp.

Description: The detector is based on Differentiable Binarization [1]. The recognizer is TRBA from WIW [2].
TRBA denotes TPS + ResNet Backbone + BiLSTM + Attention. The models were not jointly trained. Since DB does not output upvector, we rotated the detected region according to the aspect ratio. Cocotext has label noises (not case sensitive), and thus, we cleansed the dataset using the teacher model. Therefore, we used synthetic dataset (ST) and challenge-provided real datasets.

@article{kim2022deer, title={DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting}, author={Kim, Seonghyeon and Shin, Seung and Kim, Yoonsik and Cho, Han-Cheol and Kil, Taeho and Surh, Jaeheung and Park, Seunghyun and Lee, Bado and Baek, Youngmin}, journal={arXiv preprint arXiv:2203.05122}, year={2022} }

@inproceedings{baek2019STRcomparisons, title={What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis}, author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, booktitle = {International Conference on Computer Vision (ICCV)}, year={2019}, pubstate={published}, tppubtype={inproceedings} }

Source code

method: yyds2022-07-21

Authors: yuanyeyyds

Affiliation: yyds

Description: Model: For text detector, we used DBNet++. For text recognizer, we use VIT as the backbone and our model has two output head, one use ctc mechanism and the other use attention mechanism. The ensemble of these two output is used as the final result
Data: our text detector only used the official training data. for text recognizer training, we used the official data and extra 10M synthetic data

Liao M, Zou Z, Wan Z, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Source code

method: yyvis2022-07-21

Authors: yuanye

Affiliation: yyvis

Description: Model: For text detector, we used DBNet++. For text recognizer, we use VIT as the backbone and our model has two output head, one use ctc mechanism and the other use attention mechanism. The prediction with higher score is used as the recognition result
Data: our text detector only used the official training data. for text recognizer training, we used the official data and extra 10M synthetic data

Liao M, Zou Z, Wan Z, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Source code

Ranking Table

Description Paper Source Code

			All			OOV			IV
Date	Method	Hmean	Precision	Recall	Hmean	Precision	Recall	Hmean	Precision	Recall	Hmean
2022-07-20	DB_threshold2_TRBA_CocoValid	0.3910	0.6408	0.4993	0.5613	0.1526	0.4229	0.2243	0.6160	0.5096	0.5578
2022-07-21	yyds	0.2868	0.5153	0.3554	0.4207	0.1063	0.3336	0.1612	0.4857	0.3583	0.4124
2022-07-21	yyvis	0.2848	0.5120	0.3531	0.4180	0.1054	0.3326	0.1600	0.4823	0.3559	0.4095

Inactive evaluations

method: DB_threshold2_TRBA_CocoValid2022-07-20

method: yyds2022-07-21

method: yyvis2022-07-21

Ranking Table

Ranking Graphic