method: IG-BERT (single model)2021-04-09
Authors: Ryota Tanaka, Kyosuke Nishida
Affiliation: NTT Media Intelligence Laboratories, NTT Corporation
Email: ryouta.tanaka.rg@hco.ntt.co.jp
Description: IG-BERT is a V+L model pre-trained on large-scale infographic-text pairs. The model was initialized from BERT-large and trained on training and validation data. We extracted icon visual features using faster-rcnn trained on Visually29K. In the preprocessing stage, we used the google vision API to extract OCR.