method: Super_KVer2023-03-16

Authors: Lele Xie, Zuming Huang, Boqian Xia, Yu Wang, Yadong Li, Hongbin Wang, Jingdong Chen

Affiliation: Ant Group

Email: yule.xll@antgroup.com

Description: An ensemble of both discriminated and generated models. The former is a multimodal method which utilizes text, layout and image, and we train this model with two different sequence lengths, 2048 and 512 respectively. The texts and boxes are generated by independent OCR models. The latter model is an end-to-end method which directly generates K-V pairs for an input image.

method: Layoutlmv32023-03-16

Authors: Li Jie, Wang Wei, Li Songtao, Yang Yunxin, Chen Pengyu, Zhou Danya, Li Chao, Hu Shiyu, Zhang Yuqi, Xu Min, Zhao Yiru, Zhang Bin, Zhang Ruixue, Wang Di, Wang Hui, Xiang Dong

Affiliation: SPDB LAB

Email: lst1994824@163.com

Description: We use Layoutlm as the base model, through SER + RE methods to predict the k-v relationship in the picture。

method: Ant-FinCV2023-03-16

Authors: Tao Huang, Jie Wang, Tao Xu

Affiliation: Ant Group

Description: End-to-End OCR free based transformer for document understanding. The encoder maps a document image into embeddings and the decoder generates a sequence of tokens by the encoded embeddings, where the tokens can be converted into a kv type of entity linking in a structured form string. All the trained data is used and epoch is 300, and multiline keys and values are splited. The final result is corrected by the ocr output.

Ranking Table

Description Paper Source Code
DateMethodScore1Score2Score
2023-03-16Super_KVer49.93%62.97%56.45%
2023-03-16Layoutlmv329.81%41.45%35.63%
2023-03-16Ant-FinCV14.44%22.68%18.56%
2023-03-16Ant-FinCV14.38%22.62%18.50%
2023-03-16Ant-FinCV14.21%22.35%18.28%
2023-03-16Ant-FinCV13.79%21.75%17.77%
2023-03-14Layoutlm relation extraction10.99%19.22%15.10%
2023-03-16Ant-FinCV8.96%14.84%11.90%

Ranking Graphic