method: TH-DN2023-03-28

Authors: Ning Ding, Kemeng Zhao, Gang Yao, Pei Tang, Haodong Shi, Liangrui Peng

Affiliation: Tsinghua University

Email: dn22@mails.tsinghua.edu.cn

Description: The TH-DN method includes detection, tracking and recognition modules. For detection, YOLOX[1] with ResNet50 backbone is employed. For multi-object tracking, ByteTrack[2] is used with additional supports for low score detection boxes, which utilizes the similarities with tracklets to recover true objects and filter out the background detections. For recognition, an encoder-decoder architecture is adopted. The backbone is a variant of ResNet. The encoder is a bi-directional LSTM network, and the decoder is a Transformer module.

[1] Zheng G, Liu S, Wang F, et al. YOLOX: Exceeding YOLO Series in 2021. arXiv preprint arXiv:2107.08430, 2021. [2] Zhang Y, Sun P, Jiang Y, et al. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In ECCV, 2022: 1-21.