Results - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

method: Super_KVer2023-03-16

Authors: Lele Xie, Zuming Huang, Boqian Xia, Yu Wang, Yadong Li, Hongbin Wang, Jingdong Chen

Affiliation: Ant Group

Description: An ensemble of both discriminated and generated models. The former is a multimodal method which utilizes text, layout and image, and we train this model with two different sequence lengths, 2048 and 512 respectively. The texts and boxes are generated by independent OCR models. The latter model is an end-to-end method which directly generates K-V pairs for an input image.

[1] Geewook Kim, Teakgyu Hong, et al. OCR-free Document Understanding Transformer. In ECCV 2022.

[2] LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. In ACM MM 2022.

Source code

Source code 2

method: Layoutlmv32023-03-16

Authors: Li Jie, Wang Wei, Li Songtao, Yang Yunxin, Chen Pengyu, Zhou Danya, Li Chao, Hu Shiyu, Zhang Yuqi, Xu Min, Zhao Yiru, Zhang Bin, Zhang Ruixue, Wang Di, Wang Hui, Xiang Dong

Affiliation: SPDB LAB

Email: lst1994824@163.com

Description: We use Layoutlm as the base model, through SER + RE methods to predict the k-v relationship in the picture。

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Source code

method: Ant-FinCV2023-03-16

Authors: Tao Huang, Jie Wang, Tao Xu

Affiliation: Ant Group

Description: End-to-End OCR free based transformer for document understanding. The encoder maps a document image into embeddings and the decoder generates a sequence of tokens by the encoded embeddings, where the tokens can be converted into a kv type of entity linking in a structured form string. All the trained data is used and epoch is 300, and multiline keys and values are splited. The final result is corrected by the ocr output.

OCR-free Document Understanding Transformer

Source code

Ranking Table

Description Paper Source Code

Date	Method	Score1	Score2	Score
2023-03-16	Super_KVer	49.93%	62.97%	56.45%
2023-03-16	Layoutlmv3	29.81%	41.45%	35.63%
2023-03-16	Ant-FinCV	14.44%	22.68%	18.56%
2023-03-16	Ant-FinCV	14.38%	22.62%	18.50%
2023-03-16	Ant-FinCV	14.21%	22.35%	18.28%
2023-03-16	Ant-FinCV	13.79%	21.75%	17.77%
2023-03-14	Layoutlm relation extraction	10.99%	19.22%	15.10%
2023-03-16	Ant-FinCV	8.96%	14.84%	11.90%

Inactive evaluations

method: Super_KVer2023-03-16

method: Layoutlmv32023-03-16

method: Ant-FinCV2023-03-16

Ranking Table

Ranking Graphic