method: yyvis2022-07-21

Authors: yuanye

Affiliation: yyvis

Description: Model: For text detector, we used DBNet++. For text recognizer, we use VIT as the backbone and our model has two output head, one use ctc mechanism and the other use attention mechanism. The prediction with higher score is used as the recognition result
Data: our text detector only used the official training data. for text recognizer training, we used the official data and extra 10M synthetic data