Font Size: a A A

Research On Language Identification Method Based On Convolutional Network And Attention Mechanism

Posted on:2022-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:X L MaoFull Text:PDF
GTID:2518306539998289Subject:Engineering
Abstract/Summary:PDF Full Text Request
Language identification(LID)technology refers to the ability to allow the computer to predict the language classes of a speech in a short period of time.With the deepening of internationalization trend,language identification technology plays an increasingly important role in multilingual speech processing system,and has become one of the indispensable key components of multilingual intelligent speech technology.Although great progress has been made in language identification research,how to extract features more suitable for language identification and how to improve the nonlinear classification ability of the model is still the focus of current research.In recent years,deep learning has played an important role in the application of language identification,especially in the aspects of feature extraction,model establishment and classification judgment,and it has also shown superior recognition performance.This paper will combine the deep learning method and start with the model to study the influence of the model on the performance of language identification.Firstly,this paper adopts the language identification method based on CNN-Bi LSTM,introduces CNN and Bi LSTM,and then constructs CNN-Bi LSTM network.Among them,CNN can extract local features of an image,and LSTM can extract temporal features.The input feature of the network is the spectrogram.Through CNN-Bi LSTM,local features and temporal features can be extracted to enrich the features and improve the discriminability of recognition.Finally,experiments are carried out on the Eastern languages dataset and the Common Voice dataset.Compared with the convolutional network and the short long time memory network,this method achieves better results.Secondly,this paper studies the CNN-Bi GRU language identification method based on the attention mechanism.Although the CNN-Bi LSTM network can extract local features and temporal features,it has problems such as complex network structure,large number of calculation parameters,and ignoring the unbalanced distribution of language information.In order to solve the problems existing in this network,a network combining CNN-Bi GRU network and attention mechanism is proposed.In this network,Bi GRU is used instead of Bi LSTM,so the structure is simpler and the number of parameters is less,thus reducing the complexity of the network.The network uses the attention mechanism to focus more on the features associated with the lingual information and ignore the non-lingual information.This can enhance the network to learn language information,and then improve the classification performance of language identification.The experiments are carried out on the Oriental language dataset and the Common Voice dataset,and the results are satisfactory.Compared with CNN-Bi LSTM,the recognition performance of the network is improved by about 3%and 7% on the Oriental language dataset and Common Voice dataset,respectively,which verifies the effectiveness of the method.Finally,this paper adds the CBAM module on the convolutional network,extracts attention in the channel and space dimensions,and builds a language identification model with dual attention mechanisms to improve the network's ability to extract features.
Keywords/Search Tags:Language identification, Spectrogram, CNN, CBAM, BiGRU, Dual attention mechanism
PDF Full Text Request
Related items