method: newsvqa model.2023-03-21
Authors: Rakshitha R T, Bhoomika Kumta,Soumya jituri
Affiliation: KLE Technological University
Email: 01fe20bcs107@kletech.ac.in
Description: NewsVideoQA ["Watching the News: Towards VideoQA Models that can Read"] a Text-Based VideoQA dataset that consists of video that contain scene-text in them which is necessary to answer a given question. We finetuned the dataset on GIT ["GIT: Generative Image-to-text Transformer for Vision and Language"] pretrained model(it was pretrained on 0.8 billion data). GIT is SOTA of few VideoQA datasets like MSRVTT-QA etc.