method: DocGptVQA2023-04-20

Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu

Affiliation: Lenovo Research

Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents. GPT to generate python-like modular programs.

method: DocBlipVQA2023-04-16

Authors: RenZhou,QiaolingDeng,XinfengChang,LuyanWang,XiaochenHu,HuiLi, YaqiangWu

Affiliation: Lenovo Research

Description: We integrated the prediction outputs from the UDOP model and Blip2 to enhance our results,and we optimized the image encoder and included page number features to address the challenge of multi-page documents.

method: model_03272023-03-27

Authors: ZR,QL

Description: Model: multimodal T5
Data: DUDE training data
OPT: OCR_CV

Ranking Table

Description Paper Source Code
AnswerCalibrationOOD DetectionANLS per Answer type
DateMethodANLSECEAURCAUROCExtractiveAbstractiveList of answersUnanswerable
2023-04-20DocGptVQA0.50020.22400.42100.87440.51860.48320.28220.6204
2023-04-16DocBlipVQA0.47620.30650.48600.78290.50690.46310.30730.5522
2023-03-27model_03270.46590.19040.43980.88540.55210.46600.17860.4726
2023-03-16T5-concat0.38670.24890.43430.51130.37270.37500.16810.5289
2023-04-20Multi-Modal T5 VQA0.37900.59310.59310.50000.41550.40240.20210.3467
2023-04-19Multi-Modal T5 VQA0.37890.59310.59310.50000.41540.40220.20310.3467
2023-04-18Hi-VT5-beamsearch0.35740.61040.61040.50000.28310.32980.10600.6290
2023-04-21Hi-VT5-beamsearch with token type embeddings0.35590.28030.46030.48760.30950.35150.11760.5250
2023-04-26QAP0.11590.41680.90760.50140.00090.00070.00000.6199

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic