| At present,the speech rehabilitation service for speech disorder patients mainly adopts the form of artificial training,which has the disadvantages of insufficient resources,high cost,inconvenience and so on.With the rapid development of mobile Internet,artificial intelligence and precision medicine,mobile applications are becoming more and more popular.It is of great significance to use Internet and artificial intelligence to help speech disorder patients to recover.This paper focuses on the evaluation of syllable pronunciation accuracy in speech rehabilitation training.An isolated syllable speech recognition method using Connectionist Temporal Classification(CTC)is proposed in this paper.First,the effectiveness of this method is verified on a normal speech dataset.Then,three methods for evaluating the pronunciation of isolated syllable are proposed and their performance is tested in the disorder speech dataset.Finally,a system for evaluating the accuracy of syllable pronunciation of speech disorder patients is established.The main work of this paper is as follows:1.A Convolution Neural Network(CNN)based tone recognition method is proposed.Experiments on the normal speech dataset show that the proposed CNN based tone recognition method can achieve an accuracy of 95.93%.According to the characteristics of tone recognition task,an argument that using small-scale base-syllable set to train tone recognition model can reduce the demand of training data is proposed.Experiments show that when the base-syllable set is selected properly,the model can achieve an accuracy of 92.17% even if the amount of training data is reduced to one tenth of the original.The CNN tone recognition model is applied to the accuracy evaluation of speech disorder tone pronunciation,and achieve an overall accuracy of 75.03%.2.The differences between continuous and isolated syllable speech in Mandarin is theoretically analyzed.The performance and characteristics of continuous speech recognition system directly applied to isolated syllable speech recognition are studied experimentally.Experiments show that the longer the duration of isolated syllables,the worse the recognition performance of the continuous speech recognition system.An isolated syllable speech recognition method using Connectionist Temporal Classification(CTC)is proposed,and the pros and cons of different modeling unit schemes are studied through experiments.Initial and tonal final are selected as the modeling unit of CTC model in the pronunciation evaluation system.3.Based on the CTC isolated syllable speech recognition model,three methods of detecting mispronunciation of isolated syllables are proposed.The performance of each method are studied experimentally.Experiments on disorder speech dataset show that the proposed method that based on acoustic confusion information and confidence of recognition results can achieve the highest overall detection accuracy on initial and final,which are 77.19% and 71.16%.The greedy decode based method can achieve the fastest detection speed,and the overall detection accuracy of 68.94% and 62.24% can be obtained on initial and final respectively.4.A client-server framework system for evaluating the accuracy of syllable pronunciation of speech disorder patients is built.The system provides a platform for speech rehabilitation training for speech disorder patients. |