- Task 1 - Text Localization
- Task 2 - Single Image End-to-End Recognition
- Task 3 - Multi Image End-to-End Recognition
method: Multi-View Fusion Network for Text Spotting2024-08-04
Authors: Ning Ding, Gang Yao, Kemeng Zhao, Yifan Huang, Liangrui Peng
Affiliation: Tsinghua University
Description: We present the Multi-View Fusion Network for text spotting (MVFN). Firstly, MVFN is trained on multi-view images, allowing it to capture textual features from diverse viewpoints. Specifically, after obtaining the image feature maps, we compute the correlation between the main view feature maps and the reference image feature maps, and then use a convolutional layer to fuse them. Secondly, the model aggregates detection and recognition results from these multiple view images, enabling a comprehensive analysis for text spotting by exploiting complementary information from different views. Finally, we employ a copy-paste data augmentation technique to simulate text occlusions.
Occluded | General | Occluded Subcategory Recall | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Method | F-score | Recall | Precision | Recall | F-score | Occluded Visible | Occluded Inferable | Occluded Indeterminate | |||
2024-08-04 | Multi-View Fusion Network for Text Spotting | 0.95% | 0.61% | 2.16% | 2.25% | 2.20% | 0.78% | 1.52% | 0.48% |