Font Size: a A A

Spatio-temporal Feature Learning And Human Activity Analysis In Complex Scenes

Posted on:2013-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuFull Text:PDF
GTID:2218330362459209Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
This paper studies on an important issue of computer vision: human activity analysis and spatio-temporal feature learning in complex scenes. It is not only related to the traditional basic vision problems like feature detection and description, but also high level problems in machine learning like semantic analysis and understanding. In this thesis, the author focuses on human activity recognition and classification, including sparse coding with local spatio-temporal feature, unsupervised deep learning on spatio-temporal features from videos and crowd counting and analysis based on unsupervised Bayes trajectories clustering and other problems.Based on broad and in-depth reading literatures from related international conferences and transactions, the author made extensive research and analysis, proposed several novel algorithms and methods, and finally proved the effectiveness of the research through engineering application and experiments.Research on human action recognition. In this thesis, a new human action recognition method is proposed, using sparse coding on local spatio-temporal features to replace the traditional"Bag of Words"model, and further obtain the global video representations through max-pooling. Besides, the author research on dictionary learning of sparse coding, and combines transfer learning to generalize the dictionary to related tasks, and improve the classification accuracy.Research on spatio-temporal feature learning. Inspired by theory of deep learning, the author proposed a hierarchical distributed probabilistic model, learning invariance of spatio-temporal features in a unsupervised way. Given an input video, the model is able to learn hierarchical feature representations in a bottom up unsupervised way. From experiments, it is proved that unsupervised learning without label information can achieve comparable accuracy with supervised learning in action recognition tasks.Research on crowd analysis and counting algorithms. This method works without using any prior model to detect human. By tracking the low level vision features, this method get a set of trajectories of moving human and conduct unsupervised clustering to estimate the overall number of humans in the whole video. The author and other colleagues in the lab collected the video data, and proved the effectiveness of the algorithm on this dataset.
Keywords/Search Tags:action recognition, sparse coding, deep learning, deep belief network, feature learning, trajectory clustering, crowd counting
PDF Full Text Request
Related items