Font Size: a A A

A Research On Visual Content Analysis Towards Video Mining

Posted on:2010-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q S LuoFull Text:PDF
GTID:1118360302966580Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The technique video mining has a bright prospect of application, which realizes datamining for different goals and different tasks by automatic analysis on the content from thoseraw videos. Specially, the hidden patterns of interest can be discovered, and useful knowl-edge can also be obtained. They are meaningful and helpful for information analysis anddecision-making for transaction. However, the difficulties on video content understandinglimit the development of video mining.This paper aims at solving the key problems on video content understanding towardsvideo mining. We make efforts to find solutions for video syntax segmentation and semanticinformation extraction, which bridges the gap between data mining and video sequences.Some algorithms are proposed or promoted, this paper realizes video content understandingby visual information analysis. The main contributions include:Firstly, automatic shot detection is the first step on the way of video content under-standing, which realizes syntax segmentation. A concept of continuous color histogram isproposed, which is based on the idea of distance-interpolation, and the resulting histogramavoids the interval effect. In addition, Spatial Pyramid Matching is introduced to add geome-try restrictions to frames matching. When determining a shot boundary, similarity evolutionmatrix is proposed to characterize the potential shot boundaries. To compare to severalmatrix templates, Dynamic Time warping is introduced to match different matrices. Thismethod for shot detection is a united method which can detect both abrupt boundaries andgradual boundaries.Secondly, to achieve automatic annotation of visual objects in videos, a grid-basedMean-Shift method is proposed which treats video recognition as a problem of detectingand tracking on histogram features. With this method, a set of exemplars are applied to rep-resent an object, and an efficient detection is used to scan over the whole video frames withmultiple scales and rotations. Furthermore, detection is going together with tracking, and the pre-obtained objects are updated. Finally, continuous video recognition is achieved bycombining results from detection and tracking.Thirdly, a holistic approach based on spatio-temporal volumes is proposed to realize theautomatic annotation of visual actions. The detecting problem is not limited in controlled set-tings like stationary background or invariant illumination, but studied in real scenarios. Todevelop effective representation while remaining resistant to background motions, only mo-tion information is exploited to define suitable descriptors for action volumes. Based on thecalculation of optical ?ow, three types of local motion histograms are designed to describethe action inside a spatio-temporal volume. On the other hand, action models are learned byusing boosting techniques to select discriminative features for efficient classification.Additionally, a part-based approachb ased on spatio-temporal cuboids is also proposedto realize the automatic annotation of visual actions. To ensure enough number of cuboidscan be extracted, an improved detector is used to detect interest points at multiple frequen-cies. We can adjust the density and number of interest points via different combination offrequencies according to the requirement, which achieves a scalable description of an action.To make full use of the structural information among cuboids, a concept word triplet is pre-sented, which builds an explicit shape model to describe the relative positions of cuboids.The classic probabilistic Latent Semantic Analysis is introduced to achieve our part-basedaction detection.Finally, using low-level features, an approach capable of detecting and localizing ir-regularities in surveillance video is proposed. Without predefining rules or learning explicitmodels to describe regularities and irregularities, we formulate the detecting problem asquerying new observed cuboids from the database built from several video clips containingonly regular behaviors. This paper designs a descriptor to characterize a spatio-temporalcuboid, which fuses appearance, motion and spatio-temporal configuration. To infer irregu-lar cuboids from videos, a"K-best"probabilistic inference algorithm is employed to find theML estimation for each cuboid to check whether the current part of behavior is irregular ornot. Experiments on real world videos have validated the approach quantitatively.
Keywords/Search Tags:video mining, video content understanding, computer vision, patternrecognition, intelligent visual surveillance
PDF Full Text Request
Related items