Font Size: a A A

Research On Dynamic Gesture Recognition Method Based On Three-dimensional Deep Neural Network

Posted on:2019-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:X XuFull Text:PDF
GTID:2428330572451751Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,more and more technologies have been applied to people's daily life.Therefore,it has become a consistent goal of today's academic and industrial circles to make people live a more comfortable and concise life through science and technology.In recent years,the popularity of artificial intelligence has set off an upsurge of intelligent life.And human-computer interaction,as a way of communication between human and machine is more essential in intelligent life.Gesture recognition which is a simple and natural way of interaction,has attracted more attention.People expect to make human-computer interaction easier and more natural and close to human habits through gesture recognition.Therefore,in order to improve the accuracy of dynamic gesture recognition,the following works are done in this thesis.(1)Aiming at the problem of preserving the frame images containing motion information in the gesture video as much as possible,a “key frame” extraction method is proposed.First of all,the convolution neural network requires unified input,therefore,it is necessary to unify the frame number of the gesture videos.Based on the statistical analysis of the dataset,the reference frame number of the network's input is determined.Secondly,in the process of video sampling,in order to keep the "key frame" which contains abundant motion information and remove the frames with less information,a weighted average sampling method based on optical flow is used.The video is sampled according to the average optical flow value of each segment in proportion in the original video,because the value of optical flow can represent the intensity of motion.Finally,a gesture dataset with a unified frame number and rich motion information is obtained.(2)Aiming at the temporal characteristics of dynamic gestures and the degradation problem encountered in deep network,a 3D convolution neural network modified by residual idea is used to extract gesture features.In dynamic gesture recognition,a three dimensional convolution neural network is needed to extract temporal and spatial features of gesture simultaneously.Further,in order to learn more abstract features of gesture,the thesis uses a Res C3 D network which combines the residual idea and the three-dimensional convolution neural network to extract the features of the RGB,depth and optical flow data respectively.(3)Aiming at the problem that single data cannot express all information of gesture,a feature fusion strategy based on canonical correlation analysis is proposed.In gesture recognition,various kinds of data need to be fused in order to obtain more information.The thesis first analyzes the fusion strategies of video,features,and decision-making,and selects the feature fusion strategy according to the actual situation.Secondly,for feature fusion,the thesis also analyzes the advantages and disadvantages of mean fusion and cascade fusion.According to the two aspects of the recognition effect and training time,a canonical correlation analysis fusion method is used.It fused three features of RGB,depth,and optical flow from the correlation of various modal features,and obtain a comprehensive feature with abundant information.The fusion strategy lays the foundation for subsequent classification and recognition.In order to verify the effectiveness of the proposed algorithm,the thesis uses the official dataset of the Cha Learn LAP Large-scale Isolated Gesture Recognition Challenge,namely Iso GD dataset to carry out experiments and test the accuracy.First,the comparative experiments are conducted on the innovation points mentioned above and analyzed separately,which proved the effectiveness and necessity of the improvements.Then the final result of the algorithm is compared with other state-of-the art algorithms which use the same dataset.It proves that the proposed method is superior to other approaches.
Keywords/Search Tags:dynamic gesture recognition, key frame extraction, three-dimensional convolution neural network, canonical correlation analysis, feature fusion
PDF Full Text Request
Related items