method: Upstage KR2023-04-01

Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo

Affiliation: Upstage

Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].
We pretrain the text recognizer with synthetic data before we fine-tune it on the HierText dataset. We use an in-house synthetic data generator derived from the open source SynthTiger [3] to generate word images using English and Korean corpus. We generate 10M English/Korean word images with horizontal layout and 5M English/Korean word images with vertical layout. For the final submission, we use an ensemble of three text recognizers for strong and stable performance.

[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR
[3] Yim, M., Kim, Y., Cho, H.C. and Park, S., 2021. SynthTIGER: synthetic text image GEneratoR towards better text recognition models. In ICDAR 2021

method: Upstage KR2023-03-30

Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo

Affiliation: Upstage

Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].

[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR

method: Upstage KR2023-03-31

Authors: Dahyun Kim, Yunsu Kim, Seung Shin, Bibek Chaudhary, Sanghoon Kim, Sehwan Joo

Affiliation: Upstage

Description: For Task 2, we use a cascade approach where the pipeline is broken up into 1) text detection and 2) text recognition. For text detection, we use the Task 1 methodology. For text recognition, we use the ParSeq [1] architecture with the visual feature extractor changed to SwinV2 [2].

[1] Bautista, D., & Atienza, R. (2022, October). Scene text recognition with permuted autoregressive sequence models. In ECCV 2022
[2] Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., ... & Guo, B. (2022). Swin transformer v2: Scaling up capacity and resolution. In CVPR

Ranking Table

Description Paper Source Code
Word
DateMethodPQFscorePrecisionRecallTightness
2023-04-01Upstage KR0.70000.79580.82050.77250.8797
2023-03-30Upstage KR0.69610.78880.81970.76020.8825
2023-03-31Upstage KR0.69610.78880.81970.76020.8825
2023-04-02DeepSE End-to-End Text Detection and Recognition Model0.67460.77930.88050.69890.8657
2023-04-02DeepSE End-to-End Text Detection and Recognition Model0.67460.77930.88050.69890.8657
2023-03-24NVTextSpotter0.63570.74100.80940.68340.8578
2023-03-17NVTextSpotter0.61870.72320.80540.65620.8555
2023-04-01Clova DEER0.60700.76950.77910.76020.7889
2023-04-02Ensemble of three task-specific Clova DEER0.59840.76150.77630.74730.7859
2023-03-29SCUT-HUAWEI0.58120.73410.74380.72460.7917
2023-03-30DBNet++ and SATRN0.51620.71640.82760.63150.7206
2023-04-01keba0.45350.54150.67640.45150.8375
2023-05-15nn0.42920.60680.69570.53810.7072
2023-05-12adaptive_clustering0.39180.53700.68670.44090.7295
2023-05-12fixed_clustering0.39180.53700.68670.44090.7295

Ranking Graphic