Font Size: a A A

Research On Human Action Recognition With Deep-learned Features And Its Application In Video Detection

Posted on:2020-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z M XuFull Text:PDF
GTID:1366330590954116Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The widespread construction of video surveillance system in our country has changed the investigation of public security organs,and developed the technology of video detection greatly.However,in practical video detections,the surveillance video retrieval lacks the corresponding structured context descriptions.Therefore,current video detections still need manual retrieval for anomaly detection in video contents.Different from content-based general video retrieval technology,which focus on a certain type of action or event with explicit semantic attributes,the video detection is interested in the specific action from different pedestrians.Additionally,in actual urban surveillance video,the image quality of pedestrians is poor and the scale of objects is small,there are also obvious occlusion,viewing angle and illumination changes.These factors make surveillance video retrieval more difficult than general video retrieval.As the key technology of video detection,human action recognition can understand the semantic characteristics of targets' action,and identify key suspect points,hence causing widespread concern.In recent years,human action recognition has become a research hotspot,and has achieved high accuracy in public data sets.However,when the conditions are complicated,the performance of human action recognition is significantly reduced,which cannot meet the actual needs of video detection applications.The unrestricted human action recognition research is manifested as the complexity of the scene,the distribution,the measurement,and the application.The technical bottleneck in these four aspects is:(1)Due to the influence of monitoring environmental and imaging factors of the device,the surveillance environment is variable in actual video detection.The video scene is more complicated,and traditional dense trajectory features based on global strong-corner points may reduce expression ability under background noise.(2)In actual video detections,it is often impossible to obtain enough anormal training samples.The data distribution of training and testing samples is inconsistent.If the training is not comprehensive,the model cannot be generalized to new categories.The data distribution is more complicated,and the deep learning model tends to prematurely converge on small data sets.(3)The classification optimization model relies on the global similarity relationship of samples.However,in actual video detections,the specific training samples and the global similar samples are sparse.The distance metric is more complicated,and the classification model of the sample matching may be limited in the original feature space.(4)Video detection usually needs to find out anormal actions or specific emergencies from multiple pedestrians in complex scenes.The application requirements are more complicated,and the long-term untrimmed videos containing anormal events lack labeling and co-training,while the surveillance video needs to recognize the actions of multiple pedestrians at the same time.To this end,this paper carries out the research on human action recognition,focusing on the human motion description,action sample matching,distance metric tuning,and multitarget action recognition.The innovations are listed as follows:(1)Human motion description based on salient motion boundaryGiven the complexity of the scene,the monitoring scene environment is changeable,and there are many strong corner points in the background,the expression of dense trajectory features under background noise may be decreased.Evaluating the salient region boundary by global contrast based detection,we sample dense features based on salient motion boundary.The proposed method characterises the action sequences of human activity,and enhances the feature expression ability of human action recognition.The experimental results show that our methods can improve the average recognition by 2.2%,3% and 1.5% on Hollywood2,HMDB51,UCF50 respectively compared to the baseline.(2)Classifier design based on semi-supervised discriminant manifoldGiven the complexity of distribution,there are few anormal action samples and many normal action samples,the deep learning model may converge prematurely on small data sets.Optimizing the projection matrix based on discriminant manifold learning,we design a semi-supervised graph model based classifier training method.The proposed method trains the classifiers in a semi-supervised way,and improves the generalization ability of the human action recognition model.The experimental results show that our methods can improve the recognition by 4.06%,3.92%,5.06%,3.39% on JHMDB,HMDB51,UCF50,UCF101 respectively compared to the baseline.(3)Distance metric tuning based on kernelized neighborhood embeddingGiven the complexity of the measurement,the global similar samples in the Grassmann space are sparse,and infufficient labeled samples in the original feature space may cause model performance degradation.Applying similarity measurement method with combination of kernel functions based on reproducing kernel Hilbert space,we tune the distance metric method based on kernelized Grassman manifold.The proposed method is kernelized to enhance the model combination ability of human action recognition.The experimental results show that our methods can improve the recognition by 2.97%,2.59% and 2.40% on JHMDB,HMDB51,UCF101 respectively compared to the baseline.(4)Multi-target action recognition for video detectionGiven the complexity of the application,multiple different types of actions occur at the same time,the human activity area may have occlusion and insufficient annotation information.Combining object detection with multi-target tracking algorithm to segment the human activity areas,we co-train labeled and unlabeled samples based on multi-fiber network and pseudo-label deep learning.The trained model of multi-fiber network can be improved as the weak annotation information of unlabeled samples is increased.A multi-target action recognition system is also developed for the requirements of video detection.The experimental results show that our method can improve the recognition by 2.36%,6.01%,and 3.09%,when 5%,15%,and 30% samples are labeled on the split1 of HMDB51 respectively compared to the baseline,while the remaining samples are unlabeled.The multi-target action recognition system can track multiple pedestrians at the same time,and automatic labeling the human actions according to their activity areas effectively.
Keywords/Search Tags:Human Action Analysis, Semi-Supervised Learning, Deep Learning, Multi-Target Action Recognition
PDF Full Text Request
Related items