In the history of human society,language is an indispensable tool for human communication,and dialect,as a special variant of language,has always been the research focus of domestic and foreign scholars.As a precious intangible cultural heritage of the Chinese nation,Chinese dialects should not gradually disappear due to the increasing promotion of Mandarin.Whether it is to protect and inherit dialects,or to develop the practical application of speech recognition for dialects,dialect language recognition has important research significance.In the current research of language recognition,exploring the effective features applicable to the model and improving the nonlinear recognition performance of the model are still the key research directions.The emergence of deep learning has made great strides in the research of language recognition,and it has excellent capabilities in feature extraction,modeling,and category judgment.Based on the residual structure in deep learning,combined with the attention mechanism and recursive neural network,this thesis conducts research on dialect language recognition from the aspects of features and models.The main work of this article is as follows:(1)Building datasets and performing preprocessing and feature extraction.Faced with the problem that the public dialect speech dataset is difficult to obtain,this thesis establishes two experimental datasets,one of which is a combined dataset through multiple acquisition methods;the second is from the data of i FLYTEK dialect competition.Then,a series of preprocessing work is carried out on the speech data,and finally the spectrogram,Mel spectrogram,Mel frequency cepstrum coefficient(MFCC)and other speech features are extracted as the input data of the language recognition model.(2)A dialect recognition model MARNet based on a convolutional neural network(CNN)and integrated with multiple attention mechanisms was established.In this thesis,the residual network is used as the baseline network,and then the multi-head self-attention mechanism and the Triplet attention module are introduced to improve the feature extraction performance of the model.The spectrogram was selected as the best input features in this thesis through comparative experiments between different speech features,and then MARNet was compared with other classical classification models to verify that MARNet had better recognition performance.(3)Improvements to MARNet.In view of the shortcomings that the network built on CNN is difficult to pay attention to the temporal information in dialect speech,this thesis selects a two-way gated recurrent neural network to extract the forward and reverse feature information of dialect speech,so that the information in these two directions can be learned and complemented.Aiming at the problem of high complexity of network structure,this thesis introduces the depthwise separable convolution to replace part of the standard convolution,thereby reducing the computational amount of the network.The improved network is called MAR-Bi GRU,and the effectiveness of the MAR-Bi GRU network is proved by ablation experiments and comparative experiments between different networks. |