| Parkinson’s Disease(PD)is a neurodegenerative Disease,which is clinically characterized by static tremor,slow and reduced movement,increased muscle tone,postural instability,etc.Most patients will show obvious features only when the Disease progresses significantly,leading to few cases that can be detected at an early stage.Furthermore,PD patients miss the optimal treatment time,which seriously affects their quality of life.Unlike many other diseases,the current clinical diagnosis of PD can only be more accurate with invasive methods and is more expensive.Therefore,there is an urgent need for a low-cost and non-invasive method to provide auxiliary diagnosis of PD.Now,PD gait and voice print single mode recognition based on deep learning has been realized,and a good recognition accuracy has been achieved.Compared with single mode,multi-mode can obtain more characteristic information by studying the correlation and complementarity between modes,so as to train a better detection model and improve the accuracy.The analysis of the correlation and complementarity among various modes is the key point,and the use of correlation and complementarity for effective fusion is the difficulty.Early symptoms of PD patients are vocal cord injury,"mask face" and other features.Based on multi-modal deep learning,this thesis proposes a multi-modal PD detection model,which utilizes early features of PD patients to realize classification detection of PD patients,so as to provide an auxiliary role for early detection and diagnosis of PD.The research contents and contributions of this thesis are as follows:Firstly,this thesis built the facial data sets of PD patients and sorted out the data sets related to the subsequent experiments.The voice print data set of PD patients follows the data set established by Sakar et al.,which includes the voice data of PD patients and Healthy People(Healthy People,HP).The facial image data set of PD patients was built by this thesis.In this thesis,the video of 45 PD patients was collected,from which 1970 facial images of PD patients were collected.Meanwhile,the voice samples of PD patients were collected,and the voice print data set of PD patients was preliminarily expanded.Due to the low definition of facial image source network,Deep Learning Face super-resolution reconstruction(D-FSR)was applied in this thesis to improve image quality.At the same time,facial images with exaggerated expressions were selected from CK and KDEF data sets as the facial image data of HP for distinction.The experiment shows that the self-built PD face data set can realize the detection of single mode PD,and the accuracy rate is94.72%,which proves the effectiveness and feasibility of the self-built data set.Secondly,a single mode PD detection model based on Convolutional Neural Network(CNN)is established.Single-mode detection models based on voice print and facial features were established to check the effectiveness and feasibility of each single-mode model before being integrated into the multi-mode model.The influence of network models with different data capacity and different depth on detection performance is analyzed through experiments,and a more suitable data set and network model are selected.Finally,compared with the traditional PD detection method.Experimental results show that the proposed detection method is superior to the traditional detection method,and the accuracy of PD detection model based on voice print and facial features reaches 94.04% and 94.72%,respectively.Thirdly,a multimodal PD detection model based on CNN is established.First,two traditional fusion methods are used.The correlation between modes is used for feature layer fusion,and the difference between modes is used for decision layer fusion.On this basis,Feature Hybird and Fusion(FHF)is proposed in this thesis,in order to make use of the correlation and difference of different modal features in the Feature subspace simultaneously,to obtain more Feature information,train a better detection model,and thus improve the accuracy.In this thesis,the feature subspace is divided into four parts: the feature subspace of voice print and face in the single mode domain and the feature subspace of voice print and face in the multi-mode domain.Correlation loss function is used to study the correlation between the features of various modes in the multimode domain,and difference loss function is used to study the difference of the features of the same mode in the single mode domain and the multimode domain.The global features of fusion may contain redundant information,which will affect the accuracy of the model.In this thesis,a Dimensionality Reduction model(DR-VGG-16)is proposed to achieve convolution dimensionality reduction,which is verified by experiments that convolution dimensionality reduction can reduce the time complexity of training.The final experimental results show that the accuracy of feature layer fusion is 91.39%,the accuracy of decision layer fusion is 98.67% and the accuracy of FHF model is96.24%.Compared with the single mode detection model,the multi-mode detection model improves the detection accuracy and provides a better choice for the auxiliary diagnosis of PD. |