Font Size: a A A

Research On Video-based Human Action Recognition Technology

Posted on:2020-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ChenFull Text:PDF
GTID:2428330596495047Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human action recognition is always a hotspot in computer vision research.It is widely used in many fields such as medical rehabilitation training,intelligent transportation,human-computer interaction,etc.It has a wide application prospect and has great research value.However,due to the high video dimension,complicated processing,and low recognition accuracy of the model,it is necessary to propose corresponding strategies to improve.The paper proposed two model structures based on LRCN and two-stream network,one is a relatively simple tandem network based on LRCN,and the other is based on two-stream network,which proposes a fusion strategy and method of extracting global time information in video.The main work of this paper is as follows:(1)Summarize and analyze the data sets and common methods of human action recognition,and carry out certain classification processing.And some general base theoretical knowledge of deep learning is been summarized.(2)Based on LRCN,a human action recognition method based on CNN,Bi-LSTM and MLP serial network architecture is proposed.In the video preprocessing,the average sparse down-sampling method is adopted,which can effectively solve the problem that the video cannot directly input the convolution,and at the same time effectively reduce the time complexity and reduce the time of model training and forward propagation.In addition,the difference For other models,only LSTM is used,and Bi-LSTM can simultaneously learn the “sequence” and “reverse order” information of the video sequence.(3)Based on the two-stream network,a human action recognition method based on the parallel network architecture of two-stream network and Bi-GRU is proposed.The convolution fusion is introduced creatively,and GRU is used instead of LSTM.At the same time,the speed is increased.In the training,the sparse down-sampling method is also used to solve the problem of difficult video processing.The two-stream network can extract time information and spatial information in the video at the same time,but the time network can only extract the local information of the video.Therefore,after the feature is merged,the GRU is used to extract the global information of the video to make up for the deficiency of the local information.Among the two models proposed in this paper,the parallel network is obviously better than the serial network.This is mainly because the two-stream network in this paper adopts a deeper convolutional network,which can learn more abstract features,and the two-stream network structure is more complicated and special.The convolutional fusion structure of the design and the use of GRU to extract global time information can extract more abundant information.
Keywords/Search Tags:human action recognition, two-stream network, Inception-V3, ResNet, RNN
PDF Full Text Request
Related items