Neural Attention Reader for Video Comprehension

Published in KDD 2018 Deep Learning Day, 2018

Recommended citation: @article{gupta2018neural, title={Neural Attention Reader for Video Comprehension}, author={Gupta, Ashish and Mehrotra, Rishabh and Gupta, Manish}, year={2018} }

Abstract: Despite the increasing availability of informative video content, question answering on videos remains an under-researched and challenging topic. Owing to the free-flowing nature of the verbal content, long duration of videos and lack of clear demarcations on where the context is changing, answering questions from video transcripts remains challenging. We consider the problem of extracting answers for a given question from a pool of videos and propose a novel gated neural attention architecture with content bifurcation module (GABiNet) to infer answers from video content using transcript data. The proposed GABiNet model is efficient enough to consider a large number of candidate videos and jointly learns the question and content representation by incorporating question information into content representation. To deal with the lack of demarcation issue, we propose a number of content bifurcation techniques which enable the neural model to divide the transcript text into different meaningful chunks to enable tractable inference of answers. Based on experiments on a large dataset of educational videos, we investigate the benefits offered by the gating, attention and bifurcation mechanisms and demonstrate significant performance gains over a number of established baselines and stateof-the-art QA(Question Answering) techniques. We contend that our work is among the first to tackle open-domain question answering on video content, and our findings have implications for the design of video-based QA systems.

Download paper here