method: docVQAQV_V02020-04-26

Authors: Jianqi Ma, Jingye Chen

Affiliation: PolyU, Fudan University

Description: We adopt a simple neural reasoning model to learn the mapping between question and answer. The input of the model can be divided into two parts. First part is the combination of the question's word vector and spatial feature. Second part is the conbination of the candidate words' word vector, tag embeddings and spatial feature. The model fuses the inputs using a weighted spatial pooling and finally predicts words which could be possible answers.