- Task 1 - Text Localization
- Task 2 - Script identification
- Task 3 - Joint text detection and script identification
method: TH2020-04-16
Authors: Tsinghua University and Hyundai Motor Group AIRS Company
Email: Shanyu Xiao: xiaosy19@mails.tsinghua.edu.cn
Description: We have built an end-to-end scene text spotter based on Mask R-CNN & Transformer. The ResNeXt-101 backbone and multiscale training/testing are used.
method: Sogou_OCR2019-11-08
Authors: Xudong Rao, Lulu Xu, Long Ma, Xuefeng Su
Description: An arbitrary-shaped text detection method based on Mask R-CNN, we use resnext-152 as our backbone, multi-scale training and testing are adopted to get the final results.
method: AntAI-Cognition2020-04-22
Authors: Qingpei Guo, Yudong Liu, Pengcheng Yang, Yonggang Li, Yongtao Wang, Jingdong Chen, Wei Chu
Affiliation: Ant Group & PKU
Email: qingpei.gqp@antgroup.com
Description: We are from Ant Group & PKU. Our approach is an ensemble method with three text detection models. The text detection models mainly follow the MaskRCNN framework[1], with different backbones(ResNext101-64x4d[2], CBNet[3], ResNext101-32x32d_wsl[4]) used. GBDT[5] is trained to normalize confidence scores and select quadrilateral boxes with the highest quality from all text detection models' outputs. Multi-scale training and testing are adopted for all basic models. For the training set, we also add ICDAR19 MLT datasets, both training & validation sets are used to get the final result.
[1] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [2] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500. [3] Liu Y, Wang Y, Wang S, et al. Cbnet: A novel composite backbone network architecture for object detection[J]. arXiv preprint arXiv:1909.03625, 2019. [4] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196. [5] Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree[C]//Advances in neural information processing systems. 2017: 3146-3154.
Date | Method | Hmean | Precision | Recall | Average Precision | |||
---|---|---|---|---|---|---|---|---|
2020-04-16 | TH | 58.66% | 49.57% | 71.82% | 49.46% | |||
2019-11-08 | Sogou_OCR | 56.69% | 47.66% | 69.94% | 47.27% | |||
2020-04-22 | AntAI-Cognition | 56.55% | 46.52% | 72.10% | 46.56% | |||
2021-03-21 | OSKDet | 54.65% | 46.16% | 66.97% | 47.35% | |||
2019-05-08 | Baidu-VIS | 53.38% | 42.87% | 70.72% | 29.94% | |||
2019-05-30 | PMTD | 53.34% | 42.54% | 71.51% | 49.93% | |||
2019-08-08 | JDAI | 53.26% | 42.56% | 71.13% | 50.82% | |||
2019-06-02 | NJU-ImagineLab | 52.80% | 41.06% | 73.94% | 49.97% | |||
2019-03-23 | PMTD | 50.87% | 40.87% | 67.37% | 45.30% | |||
2024-03-14 | gts | 49.81% | 43.65% | 58.01% | 38.40% | |||
2019-11-05 | baseline_maskrcnn | 49.44% | 39.46% | 66.18% | 41.30% | |||
2021-05-03 | NCU_MSP | 49.43% | 40.43% | 63.59% | 25.46% | |||
2019-06-11 | 4Paradigm-Data-Intelligence | 49.41% | 37.84% | 71.18% | 26.31% | |||
2019-08-20 | juxinli | 48.99% | 39.66% | 64.07% | 43.08% | |||
2019-05-23 | 4Paradigm-Data-Intelligence | 48.88% | 37.61% | 69.78% | 25.79% | |||
2021-11-02 | fpa | 48.83% | 39.48% | 63.97% | 42.91% | |||
2021-03-25 | NCU_MSP | 48.36% | 39.02% | 63.55% | 24.40% | |||
2018-11-20 | Pixel-Anchor | 47.93% | 40.71% | 58.24% | 22.48% | |||
2022-04-22 | TextBPN++(ResNet-50 with DCN) | 47.71% | 38.01% | 64.06% | 24.39% | |||
2019-03-29 | GNNets (single scale) | 46.72% | 38.47% | 59.46% | 30.88% | |||
2018-11-28 | CRAFT | 46.15% | 37.37% | 60.33% | 22.35% | |||
2019-12-13 | BDN | 46.05% | 34.06% | 71.03% | 23.70% | |||
2023-05-22 | DeepSolo++ (ResNet-50) | 45.81% | 40.74% | 52.32% | 33.27% | |||
2020-09-28 | DCLNet | 45.59% | 37.12% | 59.08% | 22.03% | |||
2020-12-08 | cascade | 45.57% | 37.18% | 58.85% | 36.89% | |||
2021-12-12 | a | 45.34% | 34.44% | 66.35% | 22.94% | |||
2018-10-29 | Amap-CVLab | 44.87% | 35.48% | 61.00% | 30.08% | |||
2021-05-03 | adapt | 44.77% | 33.94% | 65.73% | 21.86% | |||
2019-03-19 | ccnet single scale | 44.47% | 36.19% | 57.66% | 28.13% | |||
2018-11-15 | USTC-NELSLIP | 44.42% | 32.85% | 68.55% | 38.69% | |||
2021-12-12 | b | 44.41% | 33.56% | 65.64% | 21.87% | |||
2020-10-16 | Drew | 44.31% | 35.67% | 58.47% | 33.52% | |||
2022-04-11 | TextBPN++(ResNet-50) | 44.27% | 36.15% | 57.09% | 21.03% | |||
2021-12-31 | TextPMs | 43.99% | 34.67% | 60.14% | 20.99% | |||
2024-04-02 | FPDIoU | 43.95% | 44.50% | 43.41% | 20.19% | |||
2023-12-17 | mlt_ch_03 | 43.73% | 33.68% | 62.30% | 21.02% | |||
2017-11-09 | EAST++ | 43.15% | 33.57% | 60.37% | 27.28% | |||
2021-03-03 | NCU_MSP_light | 42.95% | 32.62% | 62.87% | 20.29% | |||
2021-05-17 | NCU_FPN | 42.75% | 31.01% | 68.79% | 20.94% | |||
2018-12-22 | PKU_VDIG | 42.49% | 30.85% | 68.22% | 40.60% | |||
2018-01-22 | FOTS_v2 | 41.99% | 33.62% | 55.89% | 24.21% | |||
2020-12-08 | corner | 41.30% | 31.94% | 58.42% | 33.80% | |||
2018-05-18 | PSENet_NJU_ImagineLab (single-scale) | 41.03% | 31.96% | 57.29% | 17.80% | |||
2020-10-21 | gccnet-ensemble | 40.96% | 30.31% | 63.11% | 29.84% | |||
2018-12-04 | SPCNet_TongJi & UESTC (multi scale) | 40.84% | 31.29% | 58.81% | 17.97% | |||
2019-01-08 | ALGCD_CP | 40.45% | 30.10% | 61.65% | 26.49% | |||
2018-08-23 | Sogou_MM | 39.59% | 29.24% | 61.32% | 34.74% | |||
2019-07-15 | stela | 39.20% | 31.46% | 51.99% | 25.52% | |||
2018-03-12 | ATL Cangjie OCR | 38.91% | 28.76% | 60.12% | 31.21% | |||
2018-12-02 | Shape-Aware Based Scene Text Detector (single scale) | 38.02% | 28.44% | 57.34% | 15.95% | |||
2019-09-18 | mask RCNN Augment+ | 37.63% | 31.29% | 47.21% | 23.55% | |||
2017-06-28 | SCUT_DLVClab1 | 37.02% | 31.48% | 44.93% | 25.34% | |||
2019-05-30 | Thesis-SE | 34.72% | 25.80% | 53.07% | 21.64% | |||
2018-12-05 | EPTN-SJTU | 34.48% | 25.57% | 52.91% | 21.71% | |||
2018-12-13 | AutoCV | 33.27% | 22.65% | 62.61% | 27.75% | |||
2018-12-03 | SPCNet_TongJi & UESTC (single scale) | 30.87% | 21.16% | 57.04% | 11.89% | |||
2017-06-29 | SARI_FDU_RRPN_v1 | 30.72% | 22.58% | 48.02% | 19.88% | |||
2017-06-28 | SARI_FDU_RRPN_v0 | 28.73% | 19.91% | 51.53% | 24.29% | |||
2022-01-05 | dbnet_resnet18 | 26.56% | 18.74% | 45.57% | 13.99% | |||
2019-01-03 | YY AI OCR Group | 21.62% | 15.44% | 36.05% | 8.07% | |||
2017-06-30 | TH-DL | 20.20% | 16.53% | 25.97% | 9.24% | |||
2017-06-30 | Sensetime OCR | 18.68% | 10.93% | 64.03% | 27.49% | |||
2017-06-30 | linkage-ER-Flow | 18.52% | 12.13% | 39.18% | 6.15% | |||
2019-10-14 | TextSnake | 6.00% | 3.69% | 16.05% | 0.60% |