method: rickyyds2022-07-21

Authors: zhichao

Affiliation: rickyyds

Description: detection model: dbnet++ with multi-scale training
reg model:An Encoder-Decoder transformor-based Framework.
encoder: 12 layer of VIT-baseed block and patch size is 4x4.
an ensemble strategy is used to fusion the results from three types of decoder, which are CTC-based decoder, attention-based decoder and CTC+attention-based decoder.