Font Size: a A A

The Video Semantic Analysis And Retrieval Technology Based On Visual And Auditory Information Research

Posted on:2013-07-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L YanFull Text:PDF
GTID:1228330374499661Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of the computer technology, video compression technology and Internet technology, people can access a variety of information resources. Video becomes increasingly important content in the database because of its intuitiveness and comprehensiveness. However, because of the characteristics existing in video, such as complicated structure, content diversity and space-time multidimensionality, it becomes an urgent research topic in the field how to organizing and managing those video data to convenient for people to find the needed information expediently and fleetly. Content-based video retrieval technique arises at the historic moment in this background, which involves numerous discipline domains that are Digital Image Processing, Artificial Intelligence, Pattern Recognition and Computer Vision, etc. The study on video features and objects is aimed at understanding the semantic information embedded in video and founding an effective retrieval system. So, there is practicality and foreground to engage in the research on video semantic analysis and retrieval.The paper mainly presents content-based clip retrieval and semantic recognition methods of three video genres which are film and TV play video, racquet sports videos and new video based on fusion of visual and audio features. The semantic analysis of film and TV play video is researched from an emotion perspective in depth. A few low-level visual and audio features in the video are discussed respectively, and the unascertained measure model is properly built and applied to recognize the emotion type of video scene. Then, we study on the highlights detection in racquet video by means of the audiovisual integration. Furthermore, a news story segmentation method is presented and analyzed systematically according to the conditional random field model. The main contents of this paper are as following:(1) Based on unascertained mathematics, a novel algorithm for affective content recognition of the video is proposed by establishing the relationship between low-level video features and high-level cognitive emotion about video scene. Firstly, the scene brightness, shot cut rate and color efficacy in a video scene are selected as the low-level visual features for their special characteristics that can be used to better distinguish different types of human emotion. Similarly, a few audio emotion features are filtered and analyzed carefully to recognize the affective content with the visual features. Meanwhile, the methods of data extraction from each emotion feature are presented, and the visual and audio emotion feature vectors are created accordingly. Secondly, after constructing the unascertained object space and the index space, three unascertained measure functions are respectively formed to quantify the components in the visual and audio emotion feature vectors, and then the unascertained measure emotion matrixes are built. Finally, the information entropy is applied to determine the weights of each emotion feature vector and their components, and the emotion type of the video scene is obtained according to credible degree criteria. The experimental results verify the feasibility and effectiveness of the proposed algorithm.(2) This paper presents a new audiovisual integration scheme for retrieving the highlights from racquet sports videos. Firstly, the shots in racquet sports video are classified into two types:Court View Shot and Non-Court View Shot, by means of the image edge detection theory and the SVM classifier. Then, the inherent relations between the highlights and the audio characteristics are analyzed deeply in racquet sports video, and a SVM classifier model is used to distinguish the ball hitting and the applause from the audio stream. Finally, a few rules are reasonably established to determine the highlights based on integrating the shot semantic infonnation and the audio events embedded in the videos. And, the rally segments are sequentially ranked according to their wonderful degree yet.(3) A news story segmentation method is presented based on the CRFs. At first, some typical characteristics about audio content and structure in news video are surveyed and understood completely. By the combination approach of rule and HMM, the audio data are described as a hierachical structure equivalently and subdivided into six semantic categories, namely, anchorman voice, anchorwoman voice, alternate reporting, scene sound, delimited music and valid silence. The next, the shots in news video are classified properly into five semantic categories according to the organizational characters of video content, which are respectively anchor shot, static image shot, interview shot, advertisement shot and other shot. Meanwhile, the different semantic shots are detected and recognized successfully with the help of audio semantic features. After the classifications of audio events and semantic shots, the CRFs model is built with the keyword sequences transformation to segment the news story scenes and accomplish the semantic recognition and retrieval in news videos.(4) A content-based semantic recognition and retrieval platform was designed and implemented to validate that the proposed algorithms are effective and practical with good performance in this paper.
Keywords/Search Tags:Video Retrieval, Multimodality Information Fusion, UnascertainedMeasure, Emotion Type, Racquet Sports Video, Conditional Random Fields, NewsStory Segmentation
PDF Full Text Request
Related items