Authors: Yoonsik Kim, Taeho Kil, Seonghyeon Kim, Sukmin Seo

Affiliation: Clova AI OCR Team, NAVER/LINE Corp.

Description: The detector is based on Differentiable Binarization [1]. The recognizer is TRBA from WIW [2].
TRBA denotes TPS + ResNet Backbone + BiLSTM + Attention. The models were not jointly trained. Since DB does not output upvector, we rotated the detected region according to the aspect ratio. Cocotext has label noises (not case sensitive), and thus, we cleansed the dataset using the teacher model. Therefore, we used synthetic dataset (ST) and challenge-provided real datasets.

method: yyds2022-07-21

Authors: yuanyeyyds

Affiliation: yyds

Description: Model: For text detector, we used DBNet++. For text recognizer, we use VIT as the backbone and our model has two output head, one use ctc mechanism and the other use attention mechanism. The ensemble of these two output is used as the final result
Data: our text detector only used the official training data. for text recognizer training, we used the official data and extra 10M synthetic data

method: yyvis2022-07-21

Authors: yuanye

Affiliation: yyvis

Description: Model: For text detector, we used DBNet++. For text recognizer, we use VIT as the backbone and our model has two output head, one use ctc mechanism and the other use attention mechanism. The prediction with higher score is used as the recognition result
Data: our text detector only used the official training data. for text recognizer training, we used the official data and extra 10M synthetic data

Ranking Table

Description Paper Source Code
AllOOVIV
DateMethodHmeanPrecisionRecallHmeanPrecisionRecallHmeanPrecisionRecallHmean
2022-07-20DB_threshold2_TRBA_CocoValid0.39100.64080.49930.56130.15260.42290.22430.61600.50960.5578
2022-07-21yyds0.28680.51530.35540.42070.10630.33360.16120.48570.35830.4124
2022-07-21yyvis0.28480.51200.35310.41800.10540.33260.16000.48230.35590.4095

Ranking Graphic