method: newsvqa model.2023-03-21

Authors: Rakshitha R T, Bhoomika Kumta,Soumya jituri

Affiliation: KLE Technological University


Description: NewsVideoQA ["Watching the News: Towards VideoQA Models that can Read"] a Text-Based VideoQA dataset that consists of video that contain scene-text in them which is necessary to answer a given question. We finetuned the dataset on GIT ["GIT: Generative Image-to-text Transformer for Vision and Language"] pretrained model(it was pretrained on 0.8 billion data). GIT is SOTA of few VideoQA datasets like MSRVTT-QA etc.