- Task 1 - E2E Complex Entity Linking
- Task 2 - E2E Complex Entity Labeling
- Task 3 - E2E Zero-shot Structured Text Extraction
- Task 4 - E2E Few-shot Structured Text Extraction
method: Super_KVer2023-03-16
Authors: Lele Xie, Zuming Huang, Boqian Xia, Yu Wang, Yadong Li, Hongbin Wang, Jingdong Chen
Affiliation: Ant Group
Email: yule.xll@antgroup.com
Description: An ensemble of both discriminated and generated models. The former is a multimodal method which utilizes text, layout and image, and we train this model with two different sequence lengths, 2048 and 512 respectively. The texts and boxes are generated by independent OCR models. The latter model is an end-to-end method which directly generates K-V pairs for an input image.
method: End-to-end document relationship extraction (single-model)2023-03-15
Authors: Huiyan Wu, Pengfei Li, Can Li, Liang Qiao,
Affiliation: Davar-Lab
Description: Our method realized end-to-end information extraction (single-model) through OCR, NER and RE technologies. Text information extracted by OCR and image information are jointly transmitted to NER to identify key and value entities. RE module extracts entity pair relationships through multi-classification.
Where NER and RE are based on LayoutlmV3, and our training dataset is Hust-Cell.
method: sample-32023-03-16
Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du
Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research
Email: zzr666@mail.ustc.edu.cn
Description: 1. A table cell detection[1] model is performed to split images into table and non-table regions.
2.We perform the key-value-background classification for each OCR bounding box using the GraphDoc[2] .
3. For the table regions, we merge OCR boxes into table cells and then find the left and top keys for each value table cell according to manual rules.
4. For non-table regions (including plain text outside table cells in table images), we directly use a MLP to predict all keys for each value box.
Date | Method | Score1 | Score2 | Score | |||
---|---|---|---|---|---|---|---|
2023-03-16 | Super_KVer | 49.93% | 62.97% | 56.45% | |||
2023-03-15 | End-to-end document relationship extraction (single-model) | 43.55% | 57.90% | 50.73% | |||
2023-03-16 | sample-3 | 42.52% | 56.68% | 49.60% | |||
2023-03-16 | sample-1 | 42.13% | 56.36% | 49.25% | |||
2023-03-16 | Pre-trained model based fullpipe pair extraction (opti_v3, no inf_aug) | 42.17% | 55.63% | 48.90% | |||
2023-03-16 | Pre-trained model based fullpipe pair extraction (opti_v2, no inf_aug) | 42.10% | 55.56% | 48.83% | |||
2023-03-16 | Pre-trained model based fullpipe pair extraction (opti_v2, inf_aug) | 42.01% | 55.50% | 48.76% | |||
2023-03-15 | Pre-trained model based fullpipe pair extraction (opti_v1) | 41.56% | 55.34% | 48.45% | |||
2023-03-16 | Meituan OCR V4 | 41.10% | 54.55% | 47.83% | |||
2023-03-16 | Meituan OCR V3 | 40.67% | 54.17% | 47.42% | |||
2023-03-15 | Meituan OCR V2 | 40.97% | 53.47% | 47.22% | |||
2023-03-16 | submit-trainall | 40.65% | 52.98% | 46.82% | |||
2023-03-14 | Meituan OCR | 39.85% | 52.46% | 46.15% | |||
2023-03-16 | submit-8finetune2 | 39.58% | 51.93% | 45.75% | |||
2023-03-16 | Layoutlmv3 | 29.81% | 41.45% | 35.63% | |||
2023-03-16 | Ant-FinCV | 14.44% | 22.68% | 18.56% | |||
2023-03-16 | Ant-FinCV | 14.32% | 22.70% | 18.51% | |||
2023-03-16 | Ant-FinCV | 14.38% | 22.62% | 18.50% | |||
2023-03-16 | Ant-FinCV | 14.21% | 22.35% | 18.28% | |||
2023-03-16 | Ant-FinCV | 13.79% | 21.75% | 17.77% | |||
2023-03-14 | Layoutlm relation extraction | 10.99% | 19.22% | 15.10% | |||
2023-03-16 | Ant-FinCV | 8.96% | 14.84% | 11.90% |