Authors: Han Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, Ziqi Zheng
Affiliation: PingAn OneConnect GammaLab
Description: 1. We train a DB model to detect word-level bounding boxes and then use line-level boxes to sort them.
2. We train a text recognition model with four stages(TPS-ResNet-BiLSTM-Attention).
3. We pretrain a discrete 2d-position embedding model (DEModel-large) with question generation and span mask, and finetune it to predict the start and end positions of certain questions.
4. Our cycled splitting and merging algorithm with K-means filter out the irrelevant answer boxes effectively. Besides, we use post-processing (spell check) and data augmentation to further improve the performance.