Affiliation: dtwave technology
Description: We only use a simple transformer model with 2D attention, label smoothing and beam search.No extra language model used. The datasets all we use include Art, ReCTS, RCTW, LSVT, MLT and MTWI. Our data augmentation tricks include gaussian noise, motion blur, random perspectition and so on. All samples are resized to the same height (48) before feeding into the network. The result is generated by a single model without ensemble tricks.