Authors: Anisha Gunjal, Vipul Gupta, Moinak Bhattacharya, Digvijay Singh
Affiliation: HyperVerge, Inc.
Description: Our method uses transformer models as the backbone to solve the task provided in the challenge due to:
1. The recent success of transformer models achieving state-of-art results on most natural language tasks on benchmark datasets.
2. Q&A extension to transformer models is a seamless downstream task compared to the traditional way of modelling questions with LSTMs and similar recurrent models.
3. The final task is to obtain the span of answer tokens (start and end) and not answer-sentence creation. This is a natural fit to the one-to-one correspondence prediction done by the transformer models.
Hence, we extend two different types of transformer models i.e BERT and LayoutLM. While BERT is a suitable pick due to its natural language understanding, LayoutLM which is a transformer model with positional embeddings is an even better choice because of the document layout complexity in the provided dataset.
Overall, LayoutLM works really well where understanding of the layout takes precedence especially in cases like forms and tabular data which are a good percentage of the overall dataset.
BERT complimented LayoutLM where it failed due to the strong requirement of language context understanding needed in certain cases.
Apart from the above mentioned document segment layouts, our method also inherently learns to find answers when the same entity information is present in different formats like addresses, titles, headings, salutations and other similar notations.
Finally for our final submission, we experiment with few techniques to ensemble both the said models.