Authors: Ryota Tanaka, Kyosuke Nishida
Affiliation: NTT Media Intelligence Laboratories, NTT Corporation
Description: IG-BERT is a V+L model pre-trained on large-scale infographic-text pairs. The model was initialized from BERT-large and trained on training and validation data. We extracted icon visual features using faster-rcnn trained on Visually29K. In the preprocessing stage, we used the google vision API to extract OCR.