Authors: Xuhua Ren, Lu Wang
Description: Scene Text Recognition is an important component in various vision and language tasks. Recognizing out-of-vocabulary (OOV) words remains a challenge, and some studies suggest distinguishing between in-vocabulary (IV) and OOV words. To address this issue, we present two novel contributions. First, we propose a novel pseudo-label generation module that combines character detection and image inpainting modules to produce substantial training data. Second, we introduce an approach that optimizes the geodesic distance margins to reduce the impact of noisy samples in pseudo-labels on model convergence during training.