Authors: Yeongyu Kim, Jeasung Park

Affiliation: NHN Cloud

Email: yeg.kim@nhn.com

Description: Semi-supervised learning can improve classification task performance by using unlabeled raw images. We investigated the effect of consistency or contrastive loss to train unlabeled images and used the original cross entropy loss for training labeled data. Train dataset provided by OOV organizer and synthetic data (MJ, ST) were used as labeled data, and word images cropped by text detector in open benchmark dataset (TextVQA, ST-VQA, ...) were used as unlabeled data.

Authors: Xuhua Ren, Lu Wang

Email: renxuhua1993@gmail.com

Description: Scene Text Recognition is an important component in various vision and language tasks. Recognizing out-of-vocabulary (OOV) words remains a challenge, and some studies suggest distinguishing between in-vocabulary (IV) and OOV words. To address this issue, we present two novel contributions. First, we propose a novel pseudo-label generation module that combines character detection and image inpainting modules to produce substantial training data. Second, we introduce an approach that optimizes the geodesic distance margins to reduce the impact of noisy samples in pseudo-labels on model convergence during training.

Authors: Yeongyu Kim

Affiliation: NHN Cloud

Email: yeg.kim@nhn.com

Description: In the OOV (Out of Vocabulary) task, even word labels that do not exist in the training data must be recognized. We use adaptive positional encoding and our own macaron style transformer encoder. The permutate algorithm was applied to the decoder to make the most of the label combinations of the train data. Synthetic data (MJ, ST) are used along with the provided OOV training data.

Ranking Table

Description Paper Source Code
IVOOV
DateMethodCRWEDCRWEDCRW
2023-06-27Semi Supervised Learning for OOV Text Recognition - NHN Cloud71.92%9216683.10%3649460.74%
2023-03-04Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss70.98%10059482.81%4260859.15%
2023-06-16Optimized Transformer for OOV Text Recognition - NHN Cloud70.82%9864681.92%3852959.71%
2024-02-26Self-Supervised Learning for OOV Text Recognition - HuiGuan70.38%10865781.92%4929058.84%
2022-07-21OCRFLY_V270.31%12394781.02%4604859.61%
2023-02-27HuiGuanV270.28%11099081.73%4988958.83%
2022-07-21oov3decode70.22%9425981.58%4017558.86%
2022-07-21Vision Transformer Based Method70.00%9470181.36%4018758.64%
2022-07-21dat69.90%9651380.78%4008259.03%
2022-07-20ocrfly69.83%13123280.63%5324359.03%
2022-07-21ggui69.80%9659780.74%4017158.86%
2022-07-21spring69.74%9647780.74%4011558.74%
2022-07-21DataMatters69.68%9654480.71%4017758.65%
2022-07-20Cropped Recognition69.65%10876680.63%4495858.68%
2022-07-21MaskOCR69.63%10889480.60%4497158.65%
2022-07-20SCATTER69.58%11348279.72%4389059.45%
2022-07-20Summer68.77%10321179.48%4211858.06%
2022-07-18let me see see68.46%11650380.81%5116556.11%
2022-07-20Using only real data68.28%11818579.28%4851757.27%
2023-04-07test168.21%12338479.73%5647256.68%
2022-08-11Baseline - SCATTER_v266.68%12821977.98%5253555.38%
2022-07-18PTViT66.29%12044977.52%4941055.06%
2022-07-20demo65.86%12434777.25%4890754.47%
2022-08-11Baseline - CLOVA_v264.97%13847975.98%5434653.96%
2022-10-19attn64.02%14427576.47%6444651.57%
2022-07-19TRBA_CocoValid_InfRotation2.0_SpaceRemove63.98%13278177.76%6069350.20%
2022-07-19HuiGuan63.73%16287074.77%6892652.69%
2022-10-18ctc63.51%14110075.63%6386651.39%
2022-07-18exp5_merge54.87%14307070.93%5778638.81%
2022-07-20EOCR: Ensemble Optical Character Recognition46.66%35016655.30%11331738.02%
2022-07-17BASELINE - Official Clova44.47%36556652.61%11410136.34%
2022-07-19NNRC38.54%40560345.36%13638431.73%
2022-07-19NN37.17%42607443.38%14403230.97%
2022-07-18Cluster Character Loss in Scene Text Recognition31.06%55257047.40%20208714.73%
2022-07-20Transformer for multi-language OCR0.00%0.00%0.00%
2022-07-21TEST0.00%0.00%0.00%

Ranking Graphic