In the 1990 s,dialect identification began to be paid more and more attention by people.Researchers from various countries have conducted a lot of research on the characteristics and classification models of different dialects.At the same time,dialect identification identifies the attribution of criminal suspects in criminal cases.There are significant contributions from the parties.China is a multi-ethnic country with large differences in language.Therefore,the study of dialect identification is indispensable.The research in this field is of great significance to the promotion and application of speech recognition technology.The acoustic models commonly used in early years mainly include Hidden Markov Model(HMM)and Artificial Neural Network(ANN),such as BP neural network and RBF neural network,and they are still in use and continuously optimized.In recent years,with the deep practice of deep learning in the field of speech recognition.Using deep learning algorithms for multi-layer neural networks can get better initialization weights and make the network optimal.Convergence can be accomplished faster at extreme points,which improves the deficiencies of traditional neural networks.This paper studies the four local dialects of Changsha,Zhuzhou,Hengyang and Xiangtan in Hunan Province,and proposes an acoustic model based on gated recurrent unit(GRU)neural network and HMM model.It has achieved good recognition results.The main research contents include :In this paper,the speech feature extraction is studied.According to the deficiency of the traditional MFCC feature parameters,the feature parameter CFCC based on the human ear auditory model is studied.The principle and extraction method of CFCC characteristic parameters are introduced in detail.The effects of parameters such as the bandwidth and center frequency of cochlear filter on the extraction results are analyzed.The effects of different characteristic parameters on the recognition of Hunan dialect are compared.In the simulation experiment,Gaussian white noise,car noise and speech noisy noise were added respectively,and experiments were carried out under different signal-to-noise ratio conditions.The experimental results show that the recognition performance of CFCC characteristic parameters is better under different SNR conditions,and the advantages are more obvious under the conditions of car noise and speech noisy noise.An acoustic model based on GRU neural network and HMM is established.Firstly,the extracted characteristic parameter parameters are trained by the GRU neural network,and the initial recognition rate of the GRU neural network is obtained.Then,the training is continued by the forward-backward algorithm of the HMM,and the updated model is continuously optimized,and finally,the final recognition rate is obtained by Viterbi decoding.The model is compared with the traditional GMM-HMM acoustic model and the BP neural network-based acoustic model.The experimental process adds Gaussian white noise with different SNR and extracts different characteristic parameters.The experimental results show that the GRU neural network is better than the traditional acoustic model and better than the BP neural network,which can improve the robustness and recognition rate of the dialect identification system. |