Results - ICDAR 2023 Competition on Text-based Video Question Answering on News Videos

Task 1 - NewsVideoQA

method: BERT Large Ensemble2023-03-18

Authors: ZhuangZhuang Cai

Affiliation: GammaLab, Pingan

Description: News video question-answering method based on OCR, layout analysis, object tracking, and ASR technologies. The method utilizes OCR technology to recognize text in video frames, uses layout analysis to merge paragraphs, employs object tracking algorithms to remove duplicate text in video frames, and finally uses ASR technology to transcribe speech in video clips. The OCR de-duplicated text and ASR text are concatenated to form the context for an extractive question-answering task,we then fine-tuned the pre-trained model. Our method achieved competitive results in the ICDAR2023 NewsVideoQA competition, demonstrating the effectiveness of using OCR and ASR technologies for news video question-answering.

Report(ZhuangZhuang Cai, GammaLab, Pingan, China)

Source code

method: bert-squad2-single2023-03-18

Authors: Daquan

Affiliation: None

Email: lindq@shanghaitech.edu.cn

Description: This paper presents an OCR and ASR-based approach for the news video question-answering task. Our approach leverages OCR technology to recognize text in video frames and ASR technology to transcribe the speech in video clips. We then concatenate the OCR and ASR text to form the context for the extractive question-answering task. Our approach achieved competitive results in the ICDAR2023 NewsVideoQA competition, demonstrating the effectiveness of using OCR and ASR technology for news video question-answering.

ICDAR2023 NewsVideoQA Competition Technical Report, Daquan Lin, 2023

method: newsvqa model.2023-03-21

Authors: Rakshitha R T, Bhoomika Kumta,Soumya jituri

Affiliation: KLE Technological University

Email: 01fe20bcs107@kletech.ac.in

Description: NewsVideoQA ["Watching the News: Towards VideoQA Models that can Read"] a Text-Based VideoQA dataset that consists of video that contain scene-text in them which is necessary to answer a given question. We finetuned the dataset on GIT ["GIT: Generative Image-to-text Transformer for Vision and Language"] pretrained model(it was pretrained on 0.8 billion data). GIT is SOTA of few VideoQA datasets like MSRVTT-QA etc.

Ranking Table

Description Paper Source Code

Date	Method	ANLS	ACC
2023-03-18	BERT Large Ensemble	0.7226	0.6251
2023-03-18	bert-squad2-single	0.7035	0.6072
2023-03-21	newsvqa model.	0.3234	0.2724

Inactive evaluations

method: BERT Large Ensemble2023-03-18

method: bert-squad2-single2023-03-18

method: newsvqa model.2023-03-21

Ranking Table

Ranking Graphic

Ranking Graphic