Font Size: a A A

Research On Human Action Recognition Based On Video Stream

Posted on:2018-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ChiFull Text:PDF
GTID:2348330512996118Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet and multimedia technology,digital resources are in the exponential growth state,digital videos have entered all aspects of people's lives as an important part of digital resources,how to classify the human action in digital videos by fast and accurate has become a hot research.In order to solve these problems,the main goal of this paper is to extract the temporal and spatial features of human action in videos by convolution neural network technology in deep learning to achieve fast and accurate classification of human action in videos.The main work and achievements are as follows:(1)A Twostream residual network(Twostream-ResNet,TS-ResNet)is proposed to carry out human action recognition for the huge computational problem of convolutional neural network in the use of high quality video.Firstly,the algorithm is used to construct the two stream network structure of spatial recognition stream and temporal recognition stream to extract the spatial features and temporal characteristics of the video.Then,the two features are fused and then classified into the classifier.Finally,the experimental analysis is carried out.TS-ResNet increases the depth of the network and improves the accuracy of recognition,while reducing the time complexity of the algorithm.The results show that TS-ResNet is 0.35% higher than the best-performing iDT algorithm in the artificial feature in the UCF101 dataset and 5.6% higher in the HMDB51 dataset,Compared with the conventional convolutional neural network algorithm VLAD vector,TS-ResNet is 2.05% higher on UCF101 dataset and 6.4% higher on HMDB51 dataset.(2)A deep fusion residual network(DF-ResNet)is proposed to solve the problem that the robustness of the traditional residual network is not enough.The algorithm eliminates the deepest network in the traditional residual network,uses more middle-depth networks,increases the number of fuses,improves the number of potential underlying network combinations,and improves the overall network performance.The experimental analysis proves that the proposed DF-ResNet can provide higher accuracy than the traditional residual network.The results show that the accuracy of DF-ResNet on the two datasets is 0.6% and 1% higher than that of the traditional residual network,and the DF-Res Net is experimented with the UCF101 and HMDB51 datasets.(3)On the basis of the above algorithm,in order to further utilize the temporal factor,this paper proposes a two-stream deep fusion residual network(TDF-ResNet),which extends the deep fusion residual network into TS-ResNet structure to Identify human actions in the videos.Compared with the proposed TS-ResNet,the experimental analysis verifies that TDF-ResNet makes more efficient use of temporal information to provide higher accuracy.The experimental results show that TDF-ResNet improves the accuracy of the two datasets by 1.25% and 0.4%,respectively,compared with the proposed TS-ResNet.
Keywords/Search Tags:Deep learning, residual network, two-stream convolution network, human action recognition, deep fusion
PDF Full Text Request
Related items