method: IG-BERT (single model)2021-04-09

Authors: Ryota Tanaka, Kyosuke Nishida

Affiliation: NTT Media Intelligence Laboratories, NTT Corporation


Description: IG-BERT is a V+L model pre-trained on large-scale infographic-text pairs. The model was initialized from BERT-large and trained on training and validation data. We extracted icon visual features using faster-rcnn trained on Visually29K. In the preprocessing stage, we used the google vision API to extract OCR.