Authors: Sungrae Park, Seung Shin, Seonghyeon Kim, Jaeheung Surh, Junyeop Lee, Hwalsuk Lee
Description: Our model consists of a ResNet-based backbone, a sequence model, and an attention-based decoder . The backbone is a combination of the ResNet and SENet(squeeze and excitation network)  and the others are based on Baek et al. . We trained the model with our own synthetic datasets by applying virtual adversarial training (VAT) techniques . For this competition, we fine-tuned the model with the training dataset of SROIE. The recognition identified the texts on the detected text boxes by CRAFTS .