Authors: Ning Ding, Gang Yao, Kemeng Zhao, Yifan Huang, Liangrui Peng

Affiliation: Tsinghua University

Description: We present the Multi-View Fusion Network for text spotting (MVFN). Firstly, MVFN is trained on multi-view images, allowing it to capture textual features from diverse viewpoints. Specifically, after obtaining the image feature maps, we compute the correlation between the main view feature maps and the reference image feature maps, and then use a convolutional layer to fuse them. Secondly, the model aggregates detection and recognition results from these multiple view images, enabling a comprehensive analysis for text spotting by exploiting complementary information from different views. Finally, we employ a copy-paste data augmentation technique to simulate text occlusions.

Ranking Table

Description Paper Source Code
OccludedGeneralOccluded Subcategory Recall
DateMethodF-scoreRecallPrecisionRecallF-scoreOccluded VisibleOccluded Inferable Occluded Indeterminate
2024-08-04Multi-View Fusion Network for Text Spotting0.95%0.61%2.16%2.25%2.20%0.78%1.52%0.48%

Ranking Graphic

Ranking Graphic