method: GraphDoc+Classify+Merge2023-05-25
Authors: Yan Wang, Jiefeng Ma, Zhenrong Zhang, Pengfei Hu, Jianshu Zhang, Jun Du
Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research
Description: We pre-trained several GraphDoc models on provided unlabelled documents under different configurations. We then fine-tuned the models on the training set for 500-1000 epochs. After classifying OCR boxes into various categories, we proposed a Merger module to handle the aggregation process.
We also used some pre/post-processing according to the text content and distances between OCR boxes. Finally, we adopted model ensembling to further enhance the system performance.
method: baseline - LayoutLMv3 with unsupervised and synthetic pre-training2023-05-02
Authors: Organizers
Affiliation: Rossum.ai, Czech Technical University in Prague, University of La Rochelle
Description: Baseline method. Uses multi-label NER formulation with LayoutLMv3 as the backbone. It is pre-trained on the unlabelled and synthetic parts of the DocILE dataset.
method: baseline - RoBERTa-base with synthetic pre-training2023-05-02
Authors: Organizers
Affiliation: Rossum.ai, Czech Technical University in Prague, University of La Rochelle
Description: Baseline method. Uses multi-label NER formulation with RoBERTa base as the backbone. It is pre-trained on the synthetic part of the DocILE dataset.
Date | Method | F1 | AP | Precision | Recall | |||
---|---|---|---|---|---|---|---|---|
2023-05-25 | GraphDoc+Classify+Merge | 83.32% | 70.06% | 85.37% | 81.36% | |||
2023-05-02 | baseline - LayoutLMv3 with unsupervised and synthetic pre-training | 77.28% | 66.15% | 80.47% | 74.34% | |||
2023-05-02 | baseline - RoBERTa-base with synthetic pre-training | 75.99% | 63.87% | 79.13% | 73.10% | |||
2023-05-02 | baseline - RoBERTa-base | 75.64% | 64.31% | 78.90% | 72.63% | |||
2023-05-02 | baseline - LayoutLMv3 with unsupervised pre-training | 75.60% | 64.00% | 78.69% | 72.75% | |||
2023-05-24 | YOLOv8X+Grid | 68.32% | 50.89% | 63.82% | 73.51% | |||
2023-05-25 | SRCB Submission on Line Item Recognition | 46.32% | 22.11% | 50.07% | 43.09% |