Method: sample-1 - Task 1 - E2E Complex Entity Linking - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

method: sample-12023-03-16

Authors: Zhenrong Zhang, Lei Jiang, Youhui Guo, Jianshu Zhang, Jun Du

Affiliation: University of Science and Technology of China (USTC), iFLYTEK AI Research

Description: 1. A table cell detection[1] model is performed to split images into table and non-table regions.
2. We perform the key-value-background classification for each OCR bounding box using GraphDoc[2].
3. For the table regions, we merge OCR boxes into table cells and then find the left and top keys for each value table cell according to manual rules.
4. For non-table regions (including plain text outside table cells in table images), we use a MLP to predict all keys for each value box.

Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai, “Real-time scene text detection with differentiable binarization,” in AAAI, 2020, pp. 11474–11

Z. Zhang, J. Ma, J. Du, L. Wang and J. Zhang, "Multimodal Pre-training Based on Graph Attention Network for Document Understanding," in IEEE Transactions on Multimedia, 2022, doi: 10.1109/TMM.2022.3214102.