method: CLOVA OCR2019-04-22
Authors: Sungrae Park, Seung Shin, Seonghyeon Kim, Jaeheung Surh, Junyeop Lee, Hwalsuk Lee
Description: Our model consists of a ResNet-based backbone, a sequence model, and an attention-based decoder [1]. The backbone is a combination of the ResNet and SENet(squeeze and excitation network) [2] and the others are based on Baek et al. [1]. We trained the model with our own synthetic datasets by applying virtual adversarial training (VAT) techniques [3]. For this competition, we fine-tuned the model with the training dataset of SROIE. The recognition identified the texts on the detected text boxes by CRAFTS [4].