Method: Super_KVer - Task 1 - E2E Complex Entity Linking - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

method: Super_KVer2023-03-16

Authors: Lele Xie, Zuming Huang, Boqian Xia, Yu Wang, Yadong Li, Hongbin Wang, Jingdong Chen

Affiliation: Ant Group

Description: An ensemble of both discriminated and generated models. The former is a multimodal method which utilizes text, layout and image, and we train this model with two different sequence lengths, 2048 and 512 respectively. The texts and boxes are generated by independent OCR models. The latter model is an end-to-end method which directly generates K-V pairs for an input image.

[1] Geewook Kim, Teakgyu Hong, et al. OCR-free Document Understanding Transformer. In ECCV 2022.

[2] LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. In ACM MM 2022.

Source code

Source code 2