method: BERT Large Ensemble2023-03-18

Authors: ZhuangZhuang Cai

Affiliation: GammaLab, Pingan

Email: caizhuang588@pingan.com.cn

Description: News video question-answering method based on OCR, layout analysis, object tracking, and ASR technologies. The method utilizes OCR technology to recognize text in video frames, uses layout analysis to merge paragraphs, employs object tracking algorithms to remove duplicate text in video frames, and finally uses ASR technology to transcribe speech in video clips. The OCR de-duplicated text and ASR text are concatenated to form the context for an extractive question-answering task,we then fine-tuned the pre-trained model. Our method achieved competitive results in the ICDAR2023 NewsVideoQA competition, demonstrating the effectiveness of using OCR and ASR technologies for news video question-answering.

method: bert-squad2-single2023-03-18

Authors: Daquan

Affiliation: None

Email: lindq@shanghaitech.edu.cn

Description: This paper presents an OCR and ASR-based approach for the news video question-answering task. Our approach leverages OCR technology to recognize text in video frames and ASR technology to transcribe the speech in video clips. We then concatenate the OCR and ASR text to form the context for the extractive question-answering task. Our approach achieved competitive results in the ICDAR2023 NewsVideoQA competition, demonstrating the effectiveness of using OCR and ASR technology for news video question-answering.

method: newsvqa model.2023-03-21

Authors: Rakshitha R T, Bhoomika Kumta,Soumya jituri

Affiliation: KLE Technological University

Email: 01fe20bcs107@kletech.ac.in

Description: NewsVideoQA ["Watching the News: Towards VideoQA Models that can Read"] a Text-Based VideoQA dataset that consists of video that contain scene-text in them which is necessary to answer a given question. We finetuned the dataset on GIT ["GIT: Generative Image-to-text Transformer for Vision and Language"] pretrained model(it was pretrained on 0.8 billion data). GIT is SOTA of few VideoQA datasets like MSRVTT-QA etc.

Ranking Table

Description Paper Source Code
DateMethodANLSACC
2023-03-18BERT Large Ensemble0.72260.6251
2023-03-18bert-squad2-single0.70350.6072
2023-03-21newsvqa model.0.32340.2724

Ranking Graphic

Ranking Graphic