method: TH-DL2022-02-07

Authors: Ruijie Yan, Shanyu Xiao, Liangrui Peng, Gang Yao, Haodong Shi, Pei Tang, Ning Ding

Affiliation: Tsinghua University

Email: yrj17@mails.tsinghua.edu.cn

Description: For detection, we use an improved Mask-RCNN model. For recognition, we use a PREN2D model that consists of a primitive representation learning network (PREN) and a modified Transformer. The modified Transformer exploits contextual information, and PREN further provides global visual guidance for the decoding process to achieve better performance.

method: Baidu-VIS2020-06-30

Authors: VIS-VAR Team, Baidu Inc.*

Affiliation: VIS-VAR Team, Baidu Inc.*

Description: We are from the Department of Computer Vison, Baidu Inc. Our method mainly composes of three parts:Text detection, Script identification and Text recognition. Text detection mainly relies on LOMO and EAST, Multi-scale testing is adopted and the final result is boosted with Resnet-50 and Inception-v4 as different backbones. Next, all text lines are recognized by the unified language classification model to identify the script of the text. Eight single-language text recognition models based on Res-SENet are used to finally recognize the text line images.

Authors: Sicong Liu, Longhuang Wu, Shangxuan Tian, Haoxi Li, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are Tencent-DPPR (Data Platform Precision Recommendation) team. Our detection method follows the framework of Mask R-CNN that employs mask to detect multi-oriented scene texts. We use the MLT-19 and the MSRA-TD500 dataset to train our text detector, and we also apply a multi-scale training approach during training. To obtain the final ensemble detection results, we combined two different backbones and different multi-scale testing approaches. Our recognition methods base on CTC/Seq2Seq and CNN with self-attention/RNN. Then cropped words are recognized using different models to obtain ensemble results.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision1-NED1-NED (Case Sens.)Hmean (Case Sens.)
2022-02-07TH-DL61.76%74.16%52.91%45.58%58.76%56.88%59.15%
2020-06-30Baidu-VIS59.72%72.82%50.62%41.32%57.26%56.97%59.01%
2019-06-04Tencent-DPPR Team & USTB-PRIR59.15%71.26%50.55%35.92%58.46%58.10%58.37%
2019-06-03Tencent-DPPR Team & USTB-PRIR (Method_v0.2)58.92%71.67%50.02%41.76%58.00%57.64%58.14%
2019-06-03end2end52.50%55.34%49.93%40.89%58.47%57.85%51.61%
2019-06-03CRAFTS51.74%65.68%42.68%34.95%48.27%47.75%50.74%
2019-05-27Tencent-DPPR Team & USTB-PRIR (Method_v0.1)51.70%56.12%47.93%26.88%56.18%55.65%50.86%
2023-05-22DeepSolo++ (ResNet-50)51.22%62.31%43.49%35.86%52.95%52.61%50.52%
2019-06-04mask_rcnn-transformer51.04%52.51%49.64%25.96%55.71%54.10%49.34%
2019-06-03mask_rcnn-transformer50.44%51.90%49.07%25.34%55.28%54.14%49.11%
2023-08-07spotter47.83%67.46%37.05%29.07%43.74%43.31%46.88%
2019-05-28CRAFTS(Initial)46.99%66.21%36.41%30.54%42.52%42.01%45.97%
2019-06-04Three-stage method40.19%44.37%36.73%17.82%46.01%43.86%37.45%
2019-06-03baseline39.55%39.71%39.39%15.54%43.30%40.18%36.58%
2019-06-03icdar2019_mlt_test_lqj38.75%39.88%37.67%14.87%49.89%48.95%37.51%
2019-06-04TH-DL-v237.32%41.22%34.10%19.73%46.19%45.68%36.50%
2019-06-03TH-DL-v134.49%38.10%31.51%17.48%42.76%42.25%33.69%
2019-06-04RRPN+CLTDR33.82%38.62%30.08%11.57%38.34%37.90%33.09%
2019-06-03NXB OCR32.07%34.37%30.06%10.35%35.48%35.06%31.50%
2019-05-27TH-DL31.69%35.13%28.87%14.33%40.39%39.82%30.79%
2019-05-27NXB OCR28.42%33.39%24.74%7.96%31.50%31.19%27.93%
2019-05-22E2E-MLT26.46%37.44%20.47%7.72%26.39%25.71%24.85%
2019-05-24First submission0.00%0.00%0.00%0.00%0.00%0.00%0.00%
2019-05-27dummy0.00%0.00%0.00%0.00%0.00%0.00%0.00%

Ranking Graphic

Ranking Graphic