method: TILT2021-04-11

Authors: Research Team



Description: TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a encoder-decoder architecture. Results were obtained from single TILT-Large model pre-trained as described in a paper. Model was finetuned on challenge train set.

Ranking Table

Description Paper Source Code
Answer typeEvidenceOperation
DateMethodScoreImage spanQuestion spanMultiple spansNon spanTable/ListTextualVisual objectFigureMapComparisonArithmeticCounting TILT0.61200.67650.64190.43910.38320.59170.79160.45450.56540.44800.48010.49580.2652

Ranking Graphic

Ranking Graphic