method: GraphDoc+Classify+Merge2023-05-24

Authors: Yan Wang, Jiefeng Ma, Zhenrong Zhang, Pengfei Hu, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Description: We pre-trained several GraphDoc models on provided unlabelled documents under different configurations. We then fine-tuned the models on the training set for 200-500 epochs. After classifying OCR boxes into various categories, we proposed a Merger module to handle the aggregation process.
We also used some pre/post-processing according to the text content and distances between OCR boxes. Finally, we adopted model ensembling to further enhance the system performance.

method: YOLOv8X+Grid2023-05-08

Authors: Jakub Straka

Affiliation: University of West Bohemia, Department of Cybernetics

Description: KILE task may be solved in many different ways. We chose to approach this task as object detection. This means that we treated each field in the document as an object. As the detection model was used YOLOv8X. The model is based on the convolutional neural network. One of the advantages of this model is its speed and small size. We also incorporated methods used in [1].

1. Anoop Raveendra Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda,
Steffen Bickel, Johannes Höhne, and Jean Baptiste Faddoul. Chargrid: Towards
understanding 2d documents. arXiv preprint arXiv:1809.08799, 2018


Affiliation: UIT - University of Information Technology - VNUHCM


Description: Our approach is based on the checkpoint baseline with some improvements. We trained/used models:
1. Model RoBERTa base from scratch using FGM and Lion Optimizer with synthetic data for 30 epochs, after that, I trained on annotated data.
2. Model RoBERTa ours (checkpoint) with Lion Optimizer
3. Model RoBERTa base (checkpoint)

After that, we ensemble them by unioning words that are marked at 1 of 55 field type, post-processing.
After that, we used the ensembled model to predict unlabeled data, we have pseudo data, use them to pre-train 3 models, and train on annotated data after that.


Ranking Table

Description Paper Source Code
2023-05-02baseline - RoBERTa-base with synthetic pre-training53.90%66.38%65.86%66.92%
2023-05-02baseline - LayoutLMv3 with unsupervised and synthetic pre-training51.22%65.47%66.20%64.76%
2023-05-02baseline - LayoutLMv3 with unsupervised pre-training50.68%63.86%63.58%64.15%

Ranking Graphic

Ranking Graphic