method: LayoutLMv3&StrucText2023-03-24

Authors: Mei Jiang(姜媚),Minhui Wu(伍敏慧),Chen Li(李琛),Jing Lv(吕静),Haoxi Li(李昊曦),Lifu Wang(王立夫),Sicong Liu(刘思聪)

Affiliation: TencentOCR

Description: Based on a large pretrained model and LayoutLMv3 and StrucText architecture, with some pre/post processing methods.

method: LayoutLMv3&StrucText2023-03-24

Authors: Mei Jiang(姜媚),Minhui Wu(伍敏慧),Chen Li(李琛),Jing Lv(吕静),Haoxi Li(李昊曦),Lifu Wang(王立夫),Sicong Liu(刘思聪)

Affiliation: TencentOCR

Description: Based on a large pretrained model and LayoutLMv3 and StrucText architecture, with some pre/post processing methods.

method: sample-12023-03-25

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Email: zzr666@mail.ustc.edu.cn

Description: 1. We use the UniLM[2] and LiLT[3] as decoder to utilize text and layout information, OCR results with manual-rule sorting are fed into decoder to predict target.
2. We assemble DocPrompt[1], UniLM[2] and LiLT[3].

[1] https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/ernie-layout/README_ch.md [2] https://github.com/microsoft/unilm/blob/master/s2s-ft/

[3] Jiapeng Wang, Lianwen Jin and Kai Ding. LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding. 2022, ACL.

Ranking Table

Description Paper Source Code
DateMethodscorescore1score2
2023-03-24LayoutLMv3&StrucText84.43%87.14%73.59%
2023-03-24LayoutLMv3&StrucText82.51%85.15%71.93%
2023-03-25sample-182.13%85.24%69.68%
2023-03-23LayoutLMv3&StrucText81.88%84.60%70.96%
2023-03-24task4-base74.90%78.57%60.21%
2023-03-24Fewshot-brain_v174.39%77.81%60.71%
2023-03-23CHATGPT yyds69.77%72.03%60.71%
2023-03-24Dao Xianghu light of TianQuan68.19%71.48%55.03%
2023-03-25GRGBanking43.52%45.44%35.83%
2023-03-20test2.03%2.30%0.97%
2023-03-25Task4_gpt0.00%0.00%0.00%

Ranking Graphic