method: NAVER CLOVA2021-04-11

Authors: Jungjun Kim, Teakgyu Hong, Hyungmin Lee, Junbum Cha, Sungrae Park

Affiliation: NAVER Corp.

Email: teakgyu.hong@navercorp.com

Description: We used CLOVA OCR to obtain OCR results for images, and then preprocessed the OCR results to solve the problem by extractive QA method. For preprocessing, we followed HyperDQA's approach. To train the extractive QA model, we first pre-trained BROS[1] model (with slight modification - sharing parameters between projection matrices in self-attention) on the IIT-CDIP dataset. Then, additional pre-training was performed on the SQuAD and WikitableQa datasets. After that, answers were obtained through fine-tuning on the DocVQA dataset.

[1] https://openreview.net/pdf?id=punMXQEsPr0