method: Road video text spotting 2023-03-19
Authors: Tong xun quan; Zhang shen; Zheng juan juan
Affiliation: tongxq@mail2.sysu.edu.cn
Email: tongxq@mail2.sysu.edu.cn
Description: For the video text detection, tracking, and recognition tasks in road scenes, there are high requirements for inter-frame correlation. Currently, these tasks are still sensitive to the fast motion of different instances, particularly the similar appearance between different instances, which often leads to inter-frame correlation failures. To address this challenge, we propose a novel framework based on contrastive learning, which establishes long-range temporal information spanning multiple frames with more discriminative associations to instance embeddings.
Specifically, we employ an embedding space where inter-frame contrastive learning guarantees the similarity of the same instance across different frames and the dissimilarity of different instances across all frames, particularly for different instances of the same class with similar appearance, making the associations more discriminative. However, in long videos with complex motion and severe occlusion, instance association becomes a challenging task. To overcome this challenge, we use time-weighted softmax scores and memory-based association strategies to perform instance matching and improve the associated quality.
Our proposed framework enhances the inter-frame correlation and provides a more robust instance association for video text detection, tracking, and recognition tasks in road scenes. The experimental results demonstrate the effectiveness of our approach and its potential for practical applications in the field of computer vision.