Unsupervised Alignment of Natural Language with Video

Posted on:2016-03-09

Degree:Ph.D

Type:Thesis

University:University of Rochester

Candidate:Naim, Iftekhar

Full Text:PDF

GTID:2478390017981162

Subject:Computer Science

Abstract/Summary:

Today we encounter large amounts of video data, often accompanied with text descriptions (e.g., cooking videos and recipes, videos of wetlab experiments and protocols, movies and scripts). Extracting meaningful information from these multimodal sequences requires aligning the video frames with the corresponding sentences in the text. Previous methods for connecting language and videos relied on manual annotations, which are often tedious and expensive to collect. In this thesis, we focus on automatically aligning sentences with the corresponding video frames without any direct human supervision.;We first propose two hierarchical generative alignment models, which jointly align each sentence with the corresponding video frames, and each noun in a sentence with the corresponding object in the video frames. Next, we propose several latent-variable discriminative alignment models, which incorporate rich features involving verbs and video actions, and outperform the generative models. Our alignment algorithms are primarily applied to align biological wetlab videos with text instructions. Furthermore, we extend our alignment models for automatically aligning movie scenes with associated scripts and learning word-level translations between language pairs for which bilingual training data is unavailable.;Thesis: By exploiting the temporal ordering constraints between video and associated text, it is possible to automatically align the sentences in the text with the corresponding video frames without any direct human supervision.

Keywords/Search Tags:

Video, Text, Align, Language

Related items

1	Research Of Non Domain Knowledge Dependent Text Summarization Method
2	Embodiment of text after conceptualism: Language and video in 'Fast Trip, Long Drop' (1993) and 'Cornered' (1988)
3	Research On Video Text Extraction And The Application In Virtual Karaoke
4	Research On Video OCR
5	Text Extraction In Video
6	Research On The Technology Of Video Text Information Extraction
7	Study On Language Text Expression Of Tik Tok Funny Short Video
8	Research And Implementation Of Text Recognition In Video
9	Reasearch On Video Text Information Extraction Based On Features Integration
10	The Research On Cross Language Text Categorization Based On Interlingua Semantic