Font Size: a A A

Research On Gesture Recognition Algorithm Based On Multi-stream Three Dimensions Convolutional Neural Network

Posted on:2018-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:W H YangFull Text:PDF
GTID:2348330521450905Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid popularization of computers,human-computer interaction has become an important part of daily life,as a simple and natural way of human-computer interaction,gesture recognition has been a hotspot in the field of computer vision.Designing more representative features is commonly used method to obtain better recognition accuracy in the traditional gesture recognition algorithm.However,the researchers are required to have rich professional knowledge and practical experience to design more robust features,which improves the difficulty of research greatly,and the manual features does not necessarily applicable to the large amounts of gesture data with wide variety.Finding an easy-to-use and excellent method is a crucial research topic in the field of gesture recognition.With the emergence of large-scale data and high-performance computing,the neural network with large amounts of parameters can be fitted efficiently and quickly,and the advantage which the neural networks rely only on data size without the need to design features manually make neural network become a popular research direction in the field of gesture recognition.C3D(Convolutional 3 dimensional)neural network can simultaneously capture the temporal and spatial information in the video sequence,which is commonly used in the recognition of dynamic gesture.C3D-based gesture recognition algorithm has three main problems:(1)C3D model make a uniform sampling on the sequence with variable length to achieve normalization of video sequence,however,the speed of most of the gestures can not be constant in the completion of the process,thus the hand movement information is not uniformly distributed in the sequence of gesture.Therefore,uniform sampling based C3 D model has lost a lot of movement information in the input stage;(2)Due to the influences of human behavior habits,collection environment,background and other factors,the input sequence may have irrelevant or deformed gestures,the C3 D model which deals with total sequence will get worse result.(3)The algorithm that only relies on C3 D model cannot get a satisfactorily recognition result because of the high recognition difficulty of the dynamic gesture.We considered three kinds of data streams simultaneously,then proposed four improvement strategies on the C3 D model for existing problems of available algorithm:(1)A uniform random sampling method based on optical flow is proposed,which uses the magnitude of optical flow to express the intense degree of hand movement,then the magnitude of optical flow is used to decide the number of key frames should be extracted in each segment.In addition,in order to increase the number of data to improve the performance of C3 D neural network,we propose to increase the randomness of the sampling process under the premise of ensuring the key frames are sampled uniformly from original video.(2)We proposed a new method named layering strategy,the sequence is divided into several segment to capture more local details of original video for the improvement of network model,it can avoid the classification error due to the wrong segment to split the video sequence into multiple sub-sequences and then combine the results of C3 D model on these sub-sequences.(3)A multimodal data fusion strategy is proposed.We use RGB,depth and optical flow data to train the C3 D model and the characteristics of various data is extracted by the model.The fusion of various data can help algorithm avoid the influence of single data.(4)We utilize pre-trained C3 D model to extract the data of the network layer as features,and the SVM(Support Vector Machine)model is trained with these features.In order to verify the validity of the proposed algorithm,the module tests are carried out for above four improvement strategy on the Cha Learn LAP Iso GD dataset.Then we evaluate the proposed method with the state-of-the-art gesture recognition methods on the dataset,and the comparative result show that the proposed method can outperform the performance of other algorithms.
Keywords/Search Tags:C3D, multi-stream, A uniform random sampling method based on optical flow, layering strategy, the fusion of multimodal data, SVM
PDF Full Text Request
Related items