method: Infrrd-RADAR2021-04-01

Authors: JiangLong He, Aditya Kumar Sarda, Deepak Kumar, Cesar Duran

Affiliation: Infrrd.ai

Description: The Infrrd-RADAR (Retrieval of Answers by Document Analysis and Re-ranking) performs OCR on the set of images present in the dataset. The OCR data is utilized with the image to extract the information such as pdc filed date, candidate name, office, party and other key information from each forms. The extracted information is stored in a csv format file. Totally 28 fields are extracted from the forms. The natural language questions are parsed using spaCy. The chunks are categorized into subject, object, and dependency object. The entities are categorized into person, geo-political entity, organization. Using the categorized information, each question is converted into a set of SQL queries. The SQL queries are used with fuzzy-search algorithm to retrieve set of relevant documents. BERT-Large based model is then used to rerank the set of relevant documents. The reranked document ids are used to filter the extracted information. Based on the parsed questions, a particular field is collected and posted as an answer.

method: (Baseline) Database2021-04-12

Authors: DocVQA Organizers: R. Tito, M. Mathew, C.V. Jawahar, R. Manmatha, D. Karatzas, E. Valveny

Affiliation: CVC-UAB, CVIT-IIIT Hyderabad, Amazon

Description: Two-step method that uses extracted key-value pairs from Amazon Textract to build a database-like data structure to rank and answer the questions that are manually parsed to SQL language.

Please check cited paper for detailed information.

Authors: DocVQA Organizers: R. Tito, M. Mathew, C.V. Jawahar, R. Manmatha, D. Karatzas, E. Valveny

Affiliation: CVC-UAB, CVIT-IIIT Hyderabad, Amazon

Description: Two-step method which first ranks the documents according to the relevance between the document and the question and then uses extractive BERT QA method to get the answer from the relevant documents.

Please check cited paper for detailed information.

Ranking Table

Description Paper Source Code
DateMethodANLSLRetrieval MAP
2021-04-01Infrrd-RADAR0.774374.66%
2021-04-12(Baseline) Database0.706871.06%
2021-04-12(Baseline) Text spotting - BERT0.451372.84%
2020-05-16PingAn-OneConnect-Gammalab-DQA0.000080.90%
2020-05-05iFLYTEK-DOCR0.000079.15%

Ranking Graphic

Ranking Graphic