Font Size: a A A

Local Feature And Generative Model For Human Action Recognition

Posted on:2014-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:W ShiFull Text:PDF
GTID:2248330392960898Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Inthisthesis,weproposeandimplementahumanrecognitionsystemwithrelativehigh accuracy to meet the demand of recognizing and classifying human actions invideos. During the process of system designing, we analysis and evaluate recent state-of-the-art methods and systems for action recognition and detection. And based onprevious work we propose new practical system and method for dealing with problemsin real world.The main contributions of this thesis include the following:(1) We design and implement a module-based action recognition system usingthe pipeline paradigm. Modules in the system are loosely coupled with each otherwhich makes it possible to swiftly change or adjust a single algorithm in it. So that itis convenient to evaluate algorithms for diferent purpose in an isolated environment.And the design garantees the fexible of the whole system for further improvement andfeature extending.(2) A number of state-of-the-art methods and systems in action recognition arestudied and evaluated. We analyze their action models and recognition algorithmsand their advantages and disadvantages under diferent environments. Our system isbased on bag-of-words representation, we frst detect spatio-temporal interest points inthe video, which are points with lcoal maximum variation along boyh time and spacedomain. Local features are extracted around these interest points using extended HOGalgorithm and then quantized into visual vocabulary thus a video is converted into aset of unordered visual words. After that we apply LDA topic model on the visualdocuments to sampling their distributions over latent topics. Distance between videosare computed by Bhattacharyya distance of their topic distribution and is fnally used in action classifcation.(3) We classify the recognition process into two class: action classifcation ofsingle video and human aciton detection and recognition from a long video. For lattersituation we desing and implement an efcient way to split long video into short onesby moving window.(4) Our system is tested on diferent public human action video datasets, whichshows our method competes all the state-of-the-art methods. Our accuracy on simpledataset is higher than all other bag-of-words methods and on more complex dataset,though our performance doesn’t get highest accuracy, our speed is20times faster thanthe current best method. We also conduct experiments to fnd out how the scale ofvisual dictionary and number of latent topics infuence system performance, and getconstructive results for future system confguration.
Keywords/Search Tags:action recognition, spatio-temporal interest point, local XYT feature, bag-of-words, topic model
PDF Full Text Request
Related items