- Task 1 - E2E Complex Entity Linking
- Task 2 - E2E Complex Entity Labeling
- Task 3 - E2E Zero-shot Structured Text Extraction
- Task 4 - E2E Few-shot Structured Text Extraction
method: LayoutLMV3&StrucText2023-03-21
Authors: Minhui Wu(伍敏慧),Mei Jiang(姜媚),Chen Li(李琛),Jing Lv(吕静),Qingxiang Lin(林庆祥),Fan Yang(杨帆)
Affiliation: TencentOCR
Description: Our methods are mainly based on LayoutLMv3 and StrucTextv1 model architecture. All training models are finetuned on large pretrained models of LayoutLM and StrucText. During training and testing, we did some preprocessings to merge and split some badly detected boxes. Since entity label of kv-pair boxes are ignored, we used model trained on task1 images to predict kv relations of text boxes in task2 training/testing images. Thus we added additional 2 classes of labels (question/answer) and mapped original labels to new labels(other -> question/answer) to ease the difficulty of training. Similarly, During testing, we used kv-prediction model to filter those text boxes with kv relations and used model trained on task2 to predict entity label of the lefted boxes. In addition, we combined predicted results of different models based on scores and rules and did some postprocessings to merge texts with same entity label and generated final output.
method: LayoutLM&StrucText2023-03-20
Authors: Minhui Wu(伍敏慧),Mei Jiang(姜媚),Chen Li(李琛),Jing Lv(吕静),Qingxiang Lin(林庆祥),Fan Yang(杨帆)
Affiliation: TencentOCR
Description: Our methods are mainly based on LayoutLMv3 and StrucTextv1 model architecture. All training models are finetuned on large pretrained models of LayoutLM and StrucText. During training and testing, we did some preprocessings to merge and split some badly detected boxes. Since entity label of kv-pair boxes are ignored, we used model trained on task1 images to predict kv relations of text boxes in task2 training/testing images. Thus we added additional 2 classes of labels (question/answer) and mapped original labels to new labels(other -> question/answer) to ease the difficulty of training. Similarly, During testing, we used kv-prediction model to filter those text boxes with kv relations and used model trained on task2 to predict entity label of the lefted boxes. In addition, we combined predicted results of different models based on scores and rules and did some postprocessings to merge texts with same entity label and generated final output.
method: sample-32023-03-21
Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du
Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research
Email: zzr666@mail.ustc.edu.cn
Description: 1. We use the GraphDoc[1] to perform bounding box classification, which utilizes text, layout, and visual information simultaneously.
Date | Method | Score1 | Score2 | Score | |||
---|---|---|---|---|---|---|---|
2023-03-21 | LayoutLMV3&StrucText | 57.78% | 55.32% | 57.29% | |||
2023-03-20 | LayoutLM&StrucText | 55.65% | 52.99% | 55.12% | |||
2023-03-21 | sample-3 | 47.15% | 41.91% | 46.10% | |||
2023-03-21 | sample-1 | 46.35% | 41.15% | 45.31% | |||
2023-03-21 | task 1 transfer learning LiLT + task3 transfer learning LiLT + LilLT + Layoutlmv3 ensemble | 45.70% | 40.20% | 44.60% | |||
2023-03-21 | LayoutMask-v3 | 44.79% | 42.53% | 44.34% | |||
2023-03-21 | LayoutMask-v1 | 44.76% | 42.41% | 44.29% | |||
2023-03-21 | LayoutMask-v2 | 44.65% | 41.87% | 44.09% | |||
2023-03-20 | Pre-trained model based entity extraction (ro) | 44.98% | 40.06% | 43.99% | |||
2023-03-21 | sample-2 | 44.83% | 40.65% | 43.99% | |||
2023-03-20 | Pre-trained model based entity extraction (roxy) | 44.96% | 40.06% | 43.98% | |||
2023-03-20 | Pre-trained model based entity extraction (split_ro) | 44.61% | 39.49% | 43.59% | |||
2023-03-20 | Pre-trained model based entity extraction (split_roxy) | 44.60% | 39.49% | 43.58% | |||
2023-03-21 | EXO-brain for KIE | 44.02% | 39.63% | 43.14% | |||
2023-03-21 | Ex-brain for KIE | 44.00% | 39.46% | 43.09% | |||
2023-03-21 | Ex-brain for KIE | 44.00% | 39.46% | 43.09% | |||
2023-03-21 | Ex-brain for KIE | 43.66% | 39.30% | 42.79% | |||
2023-03-21 | multi-modal based KIE through model fusion | 42.42% | 37.97% | 41.53% | |||
2023-03-20 | Aaaa | 42.03% | 37.14% | 41.05% | |||
2023-03-21 | multi-modal based KIE through model fusion of different model | 41.94% | 36.90% | 40.93% | |||
2023-03-20 | donut | 41.64% | 37.65% | 40.84% | |||
2023-03-20 | multi-modal based KIE using LayoutLMv3 | 41.64% | 36.77% | 40.67% | |||
2023-03-21 | multi-modal based KIE through model fusion | 41.28% | 37.37% | 40.50% | |||
2023-03-20 | Ant-FinCV | 41.61% | 35.98% | 40.48% | |||
2023-03-20 | result of zhang | 41.66% | 35.65% | 40.46% | |||
2023-03-19 | result of zhang | 40.63% | 39.13% | 40.33% | |||
2023-03-21 | Ex-brain for KIE | 41.38% | 35.14% | 40.13% | |||
2023-03-17 | multi-modal based KIE using LayoutLMv3 | 40.64% | 36.61% | 39.83% | |||
2023-03-21 | result v5 | 40.93% | 35.03% | 39.75% | |||
2023-03-20 | result of zhang v4 | 40.71% | 35.39% | 39.65% | |||
2023-03-20 | result of zhang v3 | 40.57% | 34.79% | 39.42% | |||
2023-03-21 | KIE-ner-ocrapi | 35.47% | 42.44% | 36.87% | |||
2023-03-20 | KIE-ner | 35.41% | 42.61% | 36.85% | |||
2023-03-17 | 20230317 | 0.45% | 0.68% | 0.50% | |||
2023-03-16 | test | 0.02% | 0.00% | 0.01% | |||
2023-03-16 | ttt | 0.02% | 0.00% | 0.01% | |||
2023-03-20 | test | 0.02% | 0.00% | 0.01% | |||
2023-03-20 | 1 | 0.02% | 0.00% | 0.01% |