method: GoMatching2024-01-19
Authors: HeHaibin, YeMaoyuan, ZhangJing, LiuJuhua, TaoDacheng
Affiliation: Wuhan University-AI.INST
Description: We extend off-the-shelf image text spotter DeepSolo to video text spotter via long-short term matching module.
method: h&h_lab2021-08-02
Authors: HUST_VLRGROUP(Dian Jin) & HUAWEI_CLOUD_EI(Jing Wang, Shenggao Zhu)
Affiliation: h&h_lab
Description: This method is a technical method specially designed for the competition. Specifically, we try to employ the textual transcription of the word in the video to distinguish different text objects in the video. The use of textual transcription of the word boosts the performance of video text tracking a lot but also requires the supervision of the recognition results of the text instances. It’s a rude but useful method. For task 4, we further apply the post-processing method to get a more accurate recognition result for each text object after the text trajectory being generated.
method: TransDETR2022-04-15
Authors: weijia
Affiliation: Zhejiang University&Kuaishou(MMU)
Email: weijiawu@zju.edu.cn
Description: A simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition).
Date | Method | MOTA | MOTP | IDF1 | Mostly Matched | Partially Matched | Mostly Lost | |||
---|---|---|---|---|---|---|---|---|---|---|
2024-01-19 | GoMatching | 72.04% | 78.53% | 80.11% | 1002 | 205 | 160 | |||
2021-08-02 | h&h_lab | 63.76% | 77.78% | 71.08% | 673 | 381 | 313 | |||
2022-04-15 | TransDETR | 60.96% | 74.61% | 72.80% | 644 | 323 | 400 | |||
2020-02-26 | HIK_OCR | 52.98% | 74.88% | 61.85% | 618 | 253 | 487 | |||
2016-04-13 | Megvii-Image++ | 19.11% | 67.28% | 34.97% | 134 | 397 | 836 | |||
2015-04-02 | USTB_TexVideo | 15.57% | 68.47% | 28.18% | 122 | 382 | 778 | |||
2015-04-02 | Deep2Text I (Video) | 14.35% | 68.75% | 32.05% | 200 | 296 | 786 | |||
2015-04-02 | USTB_TexVideo II-2 | 13.24% | 66.61% | 21.25% | 84 | 327 | 859 | |||
2015-04-17 | Stradvision-1 | 8.98% | 70.20% | 31.95% | 122 | 432 | 813 | |||
2015-04-02 | USTB_TexVideo II-1 | 5.64% | 58.76% | 19.74% | 111 | 263 | 872 | |||
2015-03-30 | Baseline-TextSpotter | 0.00% | 0.00% | 0.00% |