Method: TransDETR - Task 4 - End-to-End - Text in Videos

method: TransDETR2022-04-15

Authors: weijia

Affiliation: Zhejiang University&Kuaishou(MMU)

Description: A simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition).

Wu, Weijia, Debing Zhang, Ying Fu, Chunhua Shen, Hong Zhou, Yuanqiang Cai, and Ping Luo. "End-to-End Video Text Spotting with Transformer." arXiv preprint arXiv:2203.10539 (2022).

Source code