Authors: Feng Cheng, Lixin Gu, Qingjie Liu, Feng Han, Jingtao Han
Description: The detection model and recognition model are trained separately.
Detection model: Based on Mask-RCNN. multi-scale. Train-set: 2017 MLT task1 train-set.
Recognition model: Based on Transformer with backbone ResNet50. A voting process is done to identify the language of recognized transcript. Train-set: 2017 MLT task2 train-set & 2019 MLT task2 train-set & 2019 MLT Synthetic dataset.