A Local Representation Method Based On Deep Clustering And Its Application To Action Recognition | Posted on:2020-08-02 | Degree:Master | Type:Thesis | Country:China | Candidate:J D Li | Full Text:PDF | GTID:2428330590960633 | Subject:Computer Science and Technology | Abstract/Summary: | PDF Full Text Request | Human action recognition is one of the most fundamental and popular research topics in computer vision.Local features show increasing effectiveness in visual recognition and local representation methods based on spatial-temporal local features become highly popular in human action recognition problems.Local representation methods are easy to use and highly computation-efficient.They can cope with most application situations that global representation methods and deep learning methods fail to deal with.Bag-of-Words(BOW)model is the most widely used local representation method in human action recognition.The key procedure in the application of BOW model is the use of traditional clustering algorithms to build visual vocabularies.However,the traditional clustering algorithms have limitations.Firstly,the computation of pairwise distances between local feature points and cluster centroids or the computation of pairwise similarities between feature points requires a high computation complexity.This makes traditional clustering algorithms fail to deal with action recognition problems with millions of local feature points.Secondly,the compromised policy of undersampling feature points to reduce computation complexity may cause a missing of distinctive feature points.Finally,the hard cluster assignment employed by traditional clustering algorithms to build BOW vector assigns every feature point to only a cluster,which may harm the generalization capacity of the BOW model.These limitations of traditional clustering algorithms motivate this paper to propose a BOW model based on deep clustering algorithms to build better BOW vectors for human action recognition.This paper proposes an effective deep clustering algorithm,Deep Embedded Regularized Clustering with Modified Dual Autoencoders Features(mDAF-DEPICT)and a BOW model based on mDAF-DEPICT(BOW-mDAF-DEPICT).The mDAF-DEPICT first maps original local feature points of video sequences to a new feature space and generates new representations.Then it predicts cluster assignment probabilities for new representations.The BOW-mDAFDEPICT builds BOW vectors for video sequences using the probabilities generated by mDAFDEPICT.Effectiveness of the proposed BOW-mDAF-DEPICT is evaluated via experiments on two benchmarking human action recognition datasets.The proposed BOW-mDAF-DEPICT yields a higher performance than BOW models based on traditional clustering algorithms and uses less computation complexity.The end-to-end joint learning is more appropriate than the greedy layer-wise disjoint learning for training of mDAF-DEPICT.The soft cluster assignment improves the performance of BOW-mDAF-DEPICT greatly when compared with the hard cluster assignment.Experimental results demonstrate that the proposed BOW-mDAF-DEPICT has promising application value in human action recognition. | Keywords/Search Tags: | Human Action Recognition, Bag-of-Words, Deep Clustering, Autoencoder | PDF Full Text Request | Related items |
| |
|