Font Size: a A A

Human Action Recognition Research Based On Video

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y X GaoFull Text:PDF
GTID:2428330623968302Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of society and the improvement of people's living standards,computer vision related technologies are more and more applied to daily life.Human action recognition,as an important research direction in computer vision,has also received widespread attention.This thesis takes the human action recognition research based on video as the research topic,focusing on the key issues that human action recognition needs to be solved urgently,such as effectively obtaining the time information between image frames,solving the long-term dependence problem in video information,and improving the algorithm's accuracy and real-time ability.So,the research of human action recognition is started from the following two different perspectives.1.Human action recognition method combining long and short time memory networks and self-attention mechanism.Firstly,the method transforms the data set into a sequence of image frames,then randomly extracts consecutive frames and inputs them into the network model.Secondly,the research is carried out from two aspects,which are a long and short-term memory network model and an improved model that adds a selfattention mechanism.Then,the ability of each model to solve the long-term dependence problem in the video is explored step by step,especially for those long-term videos which have dynamix scenes and complex human action.The experimental results show that,regardless of the accuracy of the model or the predictive ability of the model,the improved model with self-attention mechanism performs better than other models in this chapter.2.An efficient human action recognition method combining 2D convolution and 3D convolution.First of all,this method makes some improvements to the processing of the video.Every image frame sequence is divided into N segments,and a single frame image is randomly sampled from each segment,then the N frame images are input to the network together.Afterwards,considering that 3D convolution can also effectively capture time dimension information,a 3D convolution model and a series model and a parallel model that fuse 2D convolution and 3D convolution are separately constructed.The experimental results show that the series and parallel models combining 2D convolution and 3D convolution are excellent in model accuracy because of their excellent network structure well combination of BN-inception and ResNet18-3D in the model.Admittedly,the optimization strategy does greatly improve the real-time performance of the fuse model compared to the ordinary 3D convolution model.At the end,by comparing the models' experimental results in this paper with other common models,it's easy to find that the series model and the parallel model are more accurate and real-time,which will also help apply it to the field of engineering application in the future.Finally,by summarizing and analyzing the human action recognition method proposed in this thesis,some improvements about this thesis and a corresponding outlook for the future development of this field are proposed.
Keywords/Search Tags:Long and Short Time Memory, Self-Attention Mechanism, 3D Convolution, Series Model, Parallel Model
PDF Full Text Request
Related items