Inactive evaluations
method: CLOVA OCR DEER2022-07-20
Authors: Taeho Kil, Seonghyeon Kim, Sukmin Seo
Affiliation: Clova AI OCR Team, NAVER/LINE Corp.
Description: An end-to-end scene text spotter based on CNN backbone, deformable transformer encoder, location decoder and text decoder. The location decoder based on the segmentation method (Differentiable Binarization) detects text regions, and text decoder based on the deformable transformer decoder recognizes each instances from image features and detected location information. We use not multiple ensemble models but a single model, and all sub-modules are end-to-end trainable. We use real datasets provided by this challenge (train + val split), and synthetic dataset. Since cocotext dataset has a lots of label noises (with regards to alphabet capitalization), we refined the cocotext dataset annotation using teacher model (trained without cocotext).
method: e2e text spotter - final version2022-07-21
Authors: Taeho Kil, Seonghyeon Kim, Sukmin Seo
Affiliation: Clova AI OCR Team, NAVER/LINE Corp.
Description: An end-to-end scene text spotter based on CNN backbone, deformable transformer encoder, location decoder and text decoder. The location decoder based on the segmentation method (Differentiable Binarization) detects text regions, and text decoder based on the deformable transformer decoder recognizes each instances from image features and detected location information. We use not multiple ensemble models but a single model, and all sub-modules are end-to-end trainable. We use real datasets provided by this challenge (train + val split), and synthetic dataset. Since cocotext dataset has a lots of label noises (with regards to alphabet capitalization), we refined the cocotext dataset annotation using teacher model (trained without cocotext).
method: CLOVA OCR DEER2022-07-19
Authors: Sukmin Seo, Taeho Kil, Seonghyeon Kim
Affiliation: Clova AI OCR Team, NAVER/LINE Corp.
Description: An end-to-end scene text spotter based on CNN backbone, deformable transformer encoder, location decoder and text decoder. COCOTEXT labels were fixed by pseudo labeling. (upper case, lower case)
batch : 28
iter : 400k
aug : RandomRotate, RandomResizeScale, RandomCrop, ColorJitter
lr : 3e-4
weight_decay : 1e-6
without ensemble
All | OOV | IV | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | Hmean | Precision | Recall | Hmean | Precision | Recall | Hmean | Precision | Recall | Hmean | |||
2022-07-20 | CLOVA OCR DEER | 0.4242 | 0.6716 | 0.5213 | 0.5870 | 0.1856 | 0.4876 | 0.2689 | 0.6450 | 0.5259 | 0.5794 | |||
2022-07-21 | e2e text spotter - final version | 0.4239 | 0.6717 | 0.5204 | 0.5864 | 0.1858 | 0.4872 | 0.2690 | 0.6451 | 0.5249 | 0.5788 | |||
2022-07-19 | CLOVA OCR DEER | 0.4057 | 0.6399 | 0.5243 | 0.5764 | 0.1624 | 0.4800 | 0.2427 | 0.6129 | 0.5303 | 0.5686 | |||
2022-07-20 | DB_threshold2_TRBA_CocoValid | 0.3910 | 0.6408 | 0.4993 | 0.5613 | 0.1526 | 0.4229 | 0.2243 | 0.6160 | 0.5096 | 0.5578 | |||
2022-07-21 | E2E_MASK | 0.3213 | 0.4790 | 0.5414 | 0.5083 | 0.0864 | 0.4673 | 0.1458 | 0.4520 | 0.5514 | 0.4968 | |||
2022-07-21 | yyds | 0.2868 | 0.5153 | 0.3554 | 0.4207 | 0.1063 | 0.3336 | 0.1612 | 0.4857 | 0.3583 | 0.4124 | |||
2022-07-21 | yyvis | 0.2848 | 0.5120 | 0.3531 | 0.4180 | 0.1054 | 0.3326 | 0.1600 | 0.4823 | 0.3559 | 0.4095 |