method: CLOVA OCR DEER2022-07-20

Authors: Taeho Kil, Seonghyeon Kim, Sukmin Seo

Affiliation: Clova AI OCR Team, NAVER/LINE Corp.

Description: An end-to-end scene text spotter based on CNN backbone, deformable transformer encoder, location decoder and text decoder. The location decoder based on the segmentation method (Differentiable Binarization) detects text regions, and text decoder based on the deformable transformer decoder recognizes each instances from image features and detected location information. We use not multiple ensemble models but a single model, and all sub-modules are end-to-end trainable. We use real datasets provided by this challenge (train + val split), and synthetic dataset. Since cocotext dataset has a lots of label noises (with regards to alphabet capitalization), we refined the cocotext dataset annotation using teacher model (trained without cocotext).

Authors: Taeho Kil, Seonghyeon Kim, Sukmin Seo

Affiliation: Clova AI OCR Team, NAVER/LINE Corp.

Description: An end-to-end scene text spotter based on CNN backbone, deformable transformer encoder, location decoder and text decoder. The location decoder based on the segmentation method (Differentiable Binarization) detects text regions, and text decoder based on the deformable transformer decoder recognizes each instances from image features and detected location information. We use not multiple ensemble models but a single model, and all sub-modules are end-to-end trainable. We use real datasets provided by this challenge (train + val split), and synthetic dataset. Since cocotext dataset has a lots of label noises (with regards to alphabet capitalization), we refined the cocotext dataset annotation using teacher model (trained without cocotext).

method: CLOVA OCR DEER2022-07-19

Authors: Sukmin Seo, Taeho Kil, Seonghyeon Kim

Affiliation: Clova AI OCR Team, NAVER/LINE Corp.

Description: An end-to-end scene text spotter based on CNN backbone, deformable transformer encoder, location decoder and text decoder. COCOTEXT labels were fixed by pseudo labeling. (upper case, lower case)
batch : 28
iter : 400k
aug : RandomRotate, RandomResizeScale, RandomCrop, ColorJitter
lr : 3e-4
weight_decay : 1e-6
without ensemble

Ranking Table

Description Paper Source Code
AllOOVIV
DateMethodHmeanPrecisionRecallHmeanPrecisionRecallHmeanPrecisionRecallHmean
2022-07-20CLOVA OCR DEER0.42420.67160.52130.58700.18560.48760.26890.64500.52590.5794
2022-07-21e2e text spotter - final version0.42390.67170.52040.58640.18580.48720.26900.64510.52490.5788
2022-07-19CLOVA OCR DEER0.40570.63990.52430.57640.16240.48000.24270.61290.53030.5686
2022-07-20DB_threshold2_TRBA_CocoValid0.39100.64080.49930.56130.15260.42290.22430.61600.50960.5578
2022-07-21E2E_MASK0.32130.47900.54140.50830.08640.46730.14580.45200.55140.4968
2022-07-21yyds0.28680.51530.35540.42070.10630.33360.16120.48570.35830.4124
2022-07-21yyvis0.28480.51200.35310.41800.10540.33260.16000.48230.35590.4095

Ranking Graphic