Authors: AARC & PKU Joint Team
Affiliation: AARC, Huawei & WICT, PKU
Description: Model 1 (Extractive QA method) BERT-large pretrained on SQuAD + DocVQA and finetuned with the contest training data. The model was initialized with the checkpoint pretrained on SQuAD + DocVQA (https://github.com/mineshmathew/DocVQA/tree/master/BERT_baseline) and trained with the text part of OCR annotations only. The final output was voted by three trained models with different hyper-parameters.
Model 2 (VLM method): SS-Baseline trained on TextVQA, ST-VQA and contest training data. The model was initialized from BERT-base and trained with the 3 VQA datasets jointly. The max_ocr_num was set to 1000 and we didn’t use the RecogCNN features.
Post-processing and model ensemble: We did a rule-based post-processing for three types of questions, i.e., selection, inverse percentage, and summation. And we got our final result by filling the empty results of Model 1 with answers predicted by Model 2.