method: Upstage KR2023-04-01
Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo
Affiliation: Upstage
Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].
We pretrain the text recognizer with synthetic data before we fine-tune it on the HierText dataset. We use an in-house synthetic data generator derived from the open source SynthTiger [3] to generate word images using English and Korean corpus. We generate 10M English/Korean word images with horizontal layout and 5M English/Korean word images with vertical layout. For the final submission, we use an ensemble of three text recognizers for strong and stable performance.
[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR
[3] Yim, M., Kim, Y., Cho, H.C. and Park, S., 2021. SynthTIGER: synthetic text image GEneratoR towards better text recognition models. In ICDAR 2021
method: Upstage KR2023-03-30
Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo
Affiliation: Upstage
Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].
[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR
method: Upstage KR2023-03-31
Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo
Affiliation: Upstage
Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].
[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR
Word | |||||||||
---|---|---|---|---|---|---|---|---|---|
Date | Method | PQ | Fscore | Precision | Recall | Tightness | |||
2023-04-01 | Upstage KR | 0.7000 | 0.7958 | 0.8205 | 0.7725 | 0.8797 | |||
2023-03-30 | Upstage KR | 0.6961 | 0.7888 | 0.8197 | 0.7602 | 0.8825 | |||
2023-03-31 | Upstage KR | 0.6961 | 0.7888 | 0.8197 | 0.7602 | 0.8825 | |||
2023-04-02 | DeepSE End-to-End Text Detection and Recognition Model | 0.6746 | 0.7793 | 0.8805 | 0.6989 | 0.8657 | |||
2023-04-02 | DeepSE End-to-End Text Detection and Recognition Model | 0.6746 | 0.7793 | 0.8805 | 0.6989 | 0.8657 | |||
2023-04-01 | Clova DEER | 0.6070 | 0.7695 | 0.7791 | 0.7602 | 0.7889 | |||
2023-04-02 | Ensemble of three task-specific Clova DEER | 0.5984 | 0.7615 | 0.7763 | 0.7473 | 0.7859 | |||
2023-03-30 | DBNet++ and SATRN | 0.5162 | 0.7164 | 0.8276 | 0.6315 | 0.7206 | |||
2023-04-01 | keba | 0.4535 | 0.5415 | 0.6764 | 0.4515 | 0.8375 |