Font Size: a A A

Research On Large Scale Web Video Event Mining

Posted on:2016-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:C D ZhangFull Text:PDF
GTID:1228330461474268Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the increasing popularity of Internet, the users are able to easily get a large number of relevant web videos about ongoing incidents or events through search engines or video sharing websites, such as Google, Baidu, YouTube and Youku. In addition, news videos are published on newswires in digital versions like CNN, BBC, and CCTV. These facts demonstrate a new challenge for the users to grasp the major events through numerous web videos returned by the search result[1]. At the same time, it is also an opportunity to help them grasp major events, after glancing at the search results through web video mining.Generally speaking, when searching a topic, most of the users first want to know the major events and then to construct their relationship in minds. However, they have to click on the videos one by one to try to manually summarize the videos after watching all of them at the same time. It is not only highly time consuming but also difficult for users to find out what they want, especially for the unfamiliar topics. Therefore, it has become crucial to be able to automatically group relevant videos together. In this thesis, we had a deep study about the bursty features of visual and textual information. The integration about visual and textual information was deeply studied:Firstly, the thesis proposed a web video event mining framework with the integration of correlation and co-occurrence information. This is due to the facts that there are usually fewer textual features for web videos compared with text documents, and these features are often noisy, ambiguous, and incomplete. Important visual shots are frequently inserted into related videos as a reminder or support of viewpoints, acting as hot terms in the text field. These NDKs usually carry useful video content information and can be used to group videos of similar themes into events. A framework is proposed that integrates textual and visual information to improve the performance of web video event mining. NDK-within-video information calculated by co-occurrence is proposed to measure the similarity between NDKs and events. We apply a statistical method MCA to NDK-level web video event mining to explore the correlation between terms and classes, which bridges the gap between NDKs and high-level semantic concepts. Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.Secondly, the thesis proposed a web video event mining framework with the integration of content-based visual temporal information and textual distribution information. Generally, each event is composed of several scenes. However, the groups clustered through NDK-within-video information (co-occurrence) represent only one scene. Since visual near-duplicate feature trajectory focuses on time distribution, it can be used to cluster those NDKs belonging to the same event with different viewpoints. In another way, NDK-within-video information can enhance the robustness of visual near-duplicate feature trajectory, while it is easily affected by NDK edit/detection problem. Therefore, we explored to integrate NDK-within-video information and visual near-duplicate feature trajectory taking it as content-based visual temporal information, to cluster more NDKs belonging to the same event, a method is proposed at the NDK level, which accurately and efficiently learns tag relevance by the visual content relationship between NDKs to enhance the robustness of textual information. Experiments on large scale web videos from YouTube show that the proposed framework performs better than some existing methods in the comparison for web video event mining.Thirdly, the thesis proposed a web video event mining framework with the integration of adaptive association rule mining and near-duplicate segments. The concept of Near-Duplicate Segments (NDS) is proposed to describe the visual relevance from video segments to build the latent connections among different videos. We use spatio-temporal local features to represent the video segments, which effectively capture the main content of web videos and omit the disturbance/influence from video editing. We explore to group semantic related terms together through an adaptive association rule mining (ARM) method. It is believed that making use of the grouped terms to find the statistics and distribution characteristics in near-duplicate keyframe group can improve the performance of event classification to enhance their robustness. It integrates textual and visual information to compensate the defection of each other, which significantly improves the overall performance of web video event mining. Experimental results on large scale web videos from YouTube demonstrate that our proposed method achieves good performance and outperforms the selected baseline methods.
Keywords/Search Tags:Web video, event mining, Near-Duplicate Keyframes(NDK), Near-Duplicate Segments(NDS), Multiple Correspondence Analysis(MCA), feature trajectory, co-occurrence, Association Rule Mining(ARM)
PDF Full Text Request
Related items