method: Baidu-VIS2020-06-30

Authors: VIS-VAR Team, Baidu Inc.*

Affiliation: VIS-VAR Team, Baidu Inc.*

Description: We are from the Department of Computer Vison, Baidu Inc. Our method mainly composes of three parts:Text detection, Script identification and Text recognition. Text detection mainly relies on LOMO and EAST, Multi-scale testing is adopted and the final result is boosted with Resnet-50 and Inception-v4 as different backbones. Next, all text lines are recognized by the unified language classification model to identify the script of the text. Eight single-language text recognition models based on Res-SENet are used to finally recognize the text line images.

Authors: Sicong Liu, Longhuang Wu, Shangxuan Tian, Haoxi Li, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are Tencent-DPPR (Data Platform Precision Recommendation) team. Our detection method follows the framework of Mask R-CNN that employs mask to detect multi-oriented scene texts. We use the MLT-19 and the MSRA-TD500 dataset to train our text detector, and we also apply a multi-scale training approach during training. To obtain the final ensemble detection results, we combined two different backbones and different multi-scale testing approaches. Our recognition methods base on CTC/Seq2Seq and CNN with self-attention/RNN. Then cropped words are recognized using different models to obtain ensemble results.

Authors: Sicong Liu, Longhuang Wu, Shangxuan Tian, Haoxi Li, Chunchao Guo, Haibo Qin, Chang Liu, Hongfa Wang, Hongkai Chen, Qinglin lu, Chun Yang, Xucheng Yin, Lei Xiao

Description: We are Tencent-DPPR (Data Platform Precision Recommendation) team. Our detection method follows the framework of Mask R-CNN that employs mask to detect multi-oriented scene texts. We use the MLT-19 and the MSRA-TD500 dataset to train our text detector, and we also apply a multi-scale training approach during training. To obtain the final ensemble detection results, we combined two different backbones and different multi-scale testing approaches. Our recognition methods base on CTC/Seq2Seq and CNN with self-attention/RNN. Then cropped words are recognized using different models to obtain ensemble results.

Ranking Table

Description Paper Source Code
DateMethodHmeanPrecisionRecallAverage Precision1-NED1-NED (Case Sens.)Hmean (Case Sens.)
2020-06-30Baidu-VIS59.72%72.82%50.62%41.32%57.26%56.97%59.01%
2019-06-04Tencent-DPPR Team & USTB-PRIR59.15%71.26%50.55%35.92%58.46%58.10%58.37%
2019-06-03Tencent-DPPR Team & USTB-PRIR (Method_v0.2)58.92%71.67%50.02%41.76%58.00%57.64%58.14%
2019-06-03CRAFTS51.74%65.68%42.68%34.95%48.27%47.75%50.74%
2019-05-27Tencent-DPPR Team & USTB-PRIR (Method_v0.1)51.70%56.12%47.93%26.88%56.18%55.65%50.86%
2019-06-04mask_rcnn-transformer51.04%52.51%49.64%25.96%55.71%54.10%49.34%
2019-06-03mask_rcnn-transformer50.44%51.90%49.07%25.34%55.28%54.14%49.11%
2019-05-28CRAFTS(Initial)46.99%66.21%36.41%30.54%42.52%42.01%45.97%
2019-06-04Three-stage method40.19%44.37%36.73%17.82%46.01%43.86%37.45%
2019-06-04TH-DL-v237.32%41.22%34.10%19.73%46.19%45.68%36.50%
2019-06-03TH-DL-v134.49%38.10%31.51%17.48%42.76%42.25%33.69%
2019-06-03NXB OCR32.07%34.37%30.06%10.35%35.48%35.06%31.50%
2019-05-27TH-DL31.69%35.13%28.87%14.33%40.39%39.82%30.79%
2019-05-27NXB OCR28.42%33.39%24.74%7.96%31.50%31.19%27.93%
2019-05-22E2E-MLT26.46%37.44%20.47%7.72%26.39%25.71%24.85%

Ranking Graphic

Ranking Graphic