method: rickyyds2022-07-21
Authors: zhichao
Affiliation: rickyyds
Description: detection model: dbnet++ with multi-scale training
reg model:An Encoder-Decoder transformor-based Framework.
encoder: 12 layer of VIT-baseed block and patch size is 4x4.
an ensemble strategy is used to fusion the results from three types of decoder, which are CTC-based decoder, attention-based decoder and CTC+attention-based decoder.