Font Size: a A A

Research On Key Technologies In Multimedia Event Detection

Posted on:2016-05-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y LiuFull Text:PDF
GTID:1220330479493422Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, multimedia collections available to people, such as video clips provided by online portals like Youku and mobile terminal applications like Weishi, are expanding, which urgently requires effective multimedia content analysis approaches to meet increasingly diversified demands of people. As an emerging branch in multimedia content analysis, multimedia event detection is attracting considerable attention from more and more researchers. Most of the current research on multimedia event detection has focused on specific simple event detection, such as sport and news events in controlled video clips,or abnormal event detection in surveillance video clips, which is far from achieving the goal of detecting general complex events.In order to realize complicated and generic event detection, this dissertation systematically studies several key techniques in multimedia event detection, including feature representation technique and feature classification technique, based on a comprehensive literature review. Specifically, the research work and major contributions of this dissertation can be summarized as follows:(1) In order to perform multimedia event detection tasks in uncontrolled videos, a very large number of labeled videos are required for training the event classifier, which would become quite challenging especially when there are lots of events. Because an event involves usually several spatial temporal objects, one intuitive solution is to model those objects from a large number of labeled images which can be obtained very easily from standard image datasets, such as the Image Net challenge dataset, and to model their spatial temporal relationships from a relatively small number of labeled videos which can be also obtained very easily from standard video datasets, such as the TRECVID MED 2012 dataset. Accordingly, this dissertation proposes a latent group logistic regression(latent GLR) mixture model for those objects and an event bank descriptor for their spatial temporal relationships. Furthermore, this dissertation develops an efficient iterative training algorithm to learn model parameters of the individual latent GLR mixture model, which combines the coordinate descent approach and the gradient descent approach to minimize the ?2,1-norm or group regularized logistic loss function. This dissertation conducts extensive experiments to evaluate the object detection performance by using the latent GLR mixture model on the Image Net challenge dataset and the event detection performance by using the event bank descriptor on the TRECVID MED 2012 dataset. The results show that the solution based on latent GLR mixture model and event bank descriptor has an improved overall event detection performance by 10.6%, 7.5% and 6.3% seperately in terms of MAP, MPMiss and MMin NDC.(2) General complex events usually contain a lot of visual attributes, such as objects, scenes and human actions. Being different from visual features, visual attributes are hidden classes to event classifiers. Hence, proper representation of these visual attributes could be helpful in enhancing the quality of MED. This dissertation uses Gaussian Mixture Model(GMM) for representing video events with the motivation that the individual component densities of GMM could model some underlying hidden visual attributes, and propose a ?2 regularized logistic Gaussian mixture regression approach, which is also called LLGMM classifier, for a more generic and complicated MED. Then, this dissertation proposes an efficient iterative algorithm, which uses the Gradient Descent, a standard convex optimization method, to solve the objective function of LLGMM. Finally, extensive experiments are conducted on the challenging TRECVID MED 2012 development dataset. The results show that multimedia event detection with LLGMM classifier has an improved overall performance by 14.9%, 2.6% and 6.5% seperately in terms of MAP, MPMiss and MMin NDC.(3) There are a secure access control issue and a large scale robust representation issue when integrating the traditional event detection algorithms into the Web environment. For the secure access control aspect, this dissertation proposes a tree proxy-based and service-oriented access control model based on the traditional role based access control model, which separates the authority loading function from the authority distribution function and generates dynamically a child-sibling linked list structure for an access tree service with compressed privilege information. Verification experiments are conducted on the Cloud Sim simulation platform, and the results show that the proposed access control model is suitable for dynamic online environments. For the large scale robust representation aspect, inspired by the whole image based Object Bank scene descriptor, this dissertation proposes a 1000-Object-Bank which is an event descriptor for interest patches. Feature vectors of the 1000 OBK are extracted from interest patches within response pyramids of 1000 generic object detectors which are trained on standard annotated image datasets, such as the Image Net dataset and the Pascal VOC dataset. A spatial bag of words tiling approach is then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, this dissertation performs experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results show that the robust 1000 OBK event descriptor speeds up the state-of-the-art approaches by around 1.46 times to 4.15 times.
Keywords/Search Tags:Multimedia Content Analysis, Multimedia Event Detection, Feature Representation, Feature Classification, Uncontrolled Video Clips, Logistic Regression, Gaussian Mixture Model, ?2 Regularization, Access Control Model
PDF Full Text Request
Related items