method: HUMAN2020-06-13

Authors: Task1 Organizers

Affiliation: CVIT, IIIT Hyderabad, CVC-UAB, Amazon

Email: minesh.mathew@research.iiit.ac.in

Description: Human performance on the test set.
A small group of volunteers were asked to enter an answer for the given question and the image.

Authors: Han Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, Ziqi Zheng

Affiliation: PingAn OneConnect GammaLab

Description: 1. We train a DB model to detect word-level bounding boxes and then use line-level boxes to sort them.
2. We train a text recognition model with four stages(TPS-ResNet-BiLSTM-Attention).
3. We pretrain a discrete 2d-position embedding model (DEModel-large) with question generation and span mask, and finetune it to predict the start and end positions of certain questions.
4. Our cycled splitting and merging algorithm with K-means filter out the irrelevant answer boxes effectively. Besides, we use post-processing (spell check) and data augmentation to further improve the performance.

method: Structural LM-v22020-05-14

Authors: Structural LM Team

Description: 1. Pre-train Structural LM based on StructBERT
2. Use the official OCR recognition result directly

Ranking Table

Description Paper Source Code
DateMethodScoreFigure/DiagramFormTable/ListLayoutFree_textImage/PhotoHandwrittenYes/NoOthers
2020-06-13HUMAN0.98110.97560.98250.97800.98450.98390.97400.97170.99740.9828
2020-05-16PingAn-OneConnect-Gammalab-DQA0.84840.60590.90210.84630.87300.83370.58120.76920.51720.7289
2020-05-14Structural LM-v20.76740.49310.83810.76210.79240.75960.47560.62820.55170.6549
2020-05-15QA_Base_MRC_20.74150.48540.80150.67380.79430.81360.57400.58310.52870.7161
2020-05-15QA_Base_MRC_10.74070.48900.79840.66750.79360.81310.58540.60990.49430.7384
2020-05-15QA_Base_MRC_40.73480.47350.80400.66470.78380.80430.56180.58100.45980.7332
2020-05-15QA_Base_MRC_30.73220.48520.79580.65620.78420.80440.56790.57300.45110.7171
2020-05-15QA_Base_MRC_50.72740.48580.78770.65500.77540.80470.54050.56190.45980.7084
2020-05-16HyperDQA_V40.68930.38740.77920.63090.74780.71870.48670.56300.41380.5685
2020-05-16HyperDQA_V30.67690.38760.77740.61670.73320.69610.42960.53730.41380.5650
2020-05-16HyperDQA_V20.67340.38180.76660.61100.73320.68670.48340.55600.37930.5902
2020-05-09HyperDQA_V10.67170.40130.76930.61970.71670.69220.35980.55960.41380.5504
2020-05-09bert fulldata fintuned0.59000.41690.68700.42690.67100.73150.51240.49000.44830.5907
2020-05-01bert finetuned0.58720.29860.70110.48490.63590.69330.46220.47510.44830.4895
2020-04-30HyperDQA_V00.57150.31310.67800.47320.66300.57160.36230.43510.37930.4941
2020-04-27bert0.45570.22330.52590.26330.51130.77750.48590.35650.03450.5778
2020-05-16UGLIFT v0.1 (Clova OCR)0.44170.17660.56000.31780.53400.45200.22530.35730.44830.3356
2020-05-14Plain BERT QA0.35240.16870.44890.20290.43210.48120.35170.30960.03450.3747
2020-05-16Clova OCR V00.34890.09770.48550.26700.38110.39580.24890.28750.03450.3062
2020-05-01HDNet0.34010.20400.46880.21810.47100.19160.24880.27360.13790.2458
2020-05-16CLOVA OCR0.32960.12460.46120.24550.36220.37460.16920.27360.06900.3205
2020-04-29docVQAQV_V0.10.30160.20100.38980.38100.29330.06640.18420.27360.15860.1695
2020-04-26docVQAQV_V00.23420.16460.31330.26230.24830.05490.22770.18560.10340.1635
2020-06-16Test Submission0.00000.00000.00000.00000.00000.00000.00000.00000.00000.0000

Ranking Graphic