Method: Hi-VT5-beamsearch with token type embeddings - Task 1 - DUDE - Document UnderstanDing of Everything 😎

method: Hi-VT5-beamsearch with token type embeddings2023-04-21

Authors: JiangLong He, Mamatha N, Shiv Vignesh, Deepak Kumar

Description: Hi-VT5 model pretrained with private custom document collection using span masking objective. Pretrained model is then trained with DUDE dataset and Multi-Page DocVQA dataset. We trained table structure detection model, signature detection model, logo detection model, stamp detection model, generic entity recognition model, checkbox detection model, and font classification model. We extract informations from DUDE dataset using the extraction models and include them to the HiVT5 model as extra embedding layers for the model to better understand the document.

Rubèn Tito, Dimosthenis Karatzas, Ernest Valveny: “Hierarchical multimodal transformers for Multi-Page DocVQA”, 2022; [http://arxiv.org/abs/2212.05935 arXiv:2212.05935].