Font Size: a A A

Research On Video Action Recognition Method Based On Deep Learning

Posted on:2022-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2518306545498624Subject:Intelligent Building
Abstract/Summary:PDF Full Text Request
In recent years,the video action recognition,which has become one of the hotspots in computer vision community,has important application value in video retrieval,intelligent surveillance,intelligent medical treatment and human-computer interaction and other fields.The complexity of human action,the diversity of video subject and background and the change of video perspective bring great challenges to the accurate recognition of human behavior.Traditional machine learning method is difficult to adapt to the action recognition in complex scenes.Convolutional neural network based on deep learning has a wide application prospect in video action recognition because it can obtain the deep features of video.Based on the summary and analysis of previous research work in the field of action recognition,this paper has done the following work:(1)In the field of video retrieval and intelligent surveillance,the task of action recognition usually needs to analyze and encode the whole video content.Aiming at the problem of low-level spatial-temporal feature loss in 3D convolutional neural network,a Multi-Layer Spatiotemporal Information Fusion Network(MLSIFN)is proposed.Firstly,the basic backbone network is used to obtain the high-level and low-level spatiotemporal features of the video;then,semantic information,contained in the high-level spatiotemporal features,is introduced into the low-level spatiotemporal features through semantic feature embedding and fusion to enhance the semantic expression of the low-level spatiotemporal features,and then the feature fusion module is used to aggregate the global spatiotemporal information to improve the network's representation ability of spatiotemporal features;finally,different fusion strategies of high-level and low-level are explored through experiments.The experimental results show that the proposed feature fusion structure is superior in the task of action recognition.(2)The action recognition method based on convolutional neural network has the problem that the ability of long time domain modeling is insufficient and the video image will have a poor recognition effect when the change of illumination and view angle and the complex background cause great interference to the video image.In view of these problems,a Video Action Recognition Method Combining Spatiotemporal Features and Optical Flow Information(AMCSOF)is proposed.Firstly,the time domain modeling of the whole video segment is established by using uniform sparse sampling strategy,and the long time sequence information is fully reserved under the premise of reducing the redundancy of video frames;secondly,the optical flow is less affected by the difference of the moving subject and the complex background,and it can also reflect the direction and speed of the moving subject.Based on above work,based on Multi-Layer Spatiotemporal Information Fusion Network(MLSIFN),the optical flow data feature is introduced,the optical flow information network is established and the key information of the optical flow feature map is strengthened by combining the spatial attention model.The robustness of the network in different scenarios is improved through the complementary advantages between different data modes;finally,the decision fusion is carried out at the end of the network combining the extracted spatiotemporal features and optical flow information.The experimental results show that the model can not only complete the time domain modeling of the growing video segment,but also improve the accuracy of action recognition in complex and changeable scenes.(3)Compared with video images,skeleton data is more concise and efficient in the expression of action when it does not involve the object or scene context information.This paper also attempts to use this method for action recognition,focusing on the recognition of human falling action,and proposes a Fall Recognition Method Based on Human Posture Characteristics(FRMPC).Firstly,the coordinate information of human skeleton and key points is extracted from the video image by Open Pose human posture estimation algorithm;secondly,through the analysis of the elderly fall action,the information of key points with significant changes in coordinate values is obtained when the fall action occurs,and the human posture feature vector is extracted;finally,the fall action recognition is completed by training the action classification network.The experimental results show that the proposed method can be used to monitor the daily activities of the elderly living alone.
Keywords/Search Tags:deep learning, action recognition, fall recognition, feature extraction
PDF Full Text Request
Related items