Authors: Jungjun Kim, Teakgyu Hong, Hyungmin Lee, Junbum Cha, Sungrae Park
Affiliation: NAVER Corp.
Description: We used CLOVA OCR to obtain OCR results for images, and then preprocessed the OCR results to solve the problem by extractive QA method. For preprocessing, we followed HyperDQA's approach. To train the extractive QA model, we first pre-trained BROS model (with slight modification - sharing parameters between projection matrices in self-attention) on the IIT-CDIP dataset. Then, additional pre-training was performed on the SQuAD and WikitableQa datasets. After that, answers were obtained through fine-tuning on the DocVQA dataset.