Font Size: a A A

Speaker Recognition Based On Multi-Resolution Frequency Features And Parallel Neural Network

Posted on:2023-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhangFull Text:PDF
GTID:2568306800952409Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
It is crucial to extract effective acoustic features of a speaker’s speech to improve the performance of a speaker recognition system.Generally,the frequency domain features can be extracted by the Mel filter bank from a speaker’s speech.The frequency features of different resolutions can be extracted from a speaker’s speech by several Mel filter banks,where the numbers of the triangular filters in the Mel filter banks are different.There are some complementarities between the frequency features of different resolutions.In this thesis,we study the speaker recognition methods based on the multi-resolution frequency features and the parallel neural networks.The main work of the thesis is as follows:1.A speaker recognition method based on the multi-resolution frequency features and the parallel residual neural network(PRNN)is proposed.Two frequency features of different resolutions can be extracted by two Mel filter banks from a speaker’s speech,and employed as the inputs of the parallel convolutional layers of the PRNN,respectively.The fusion feature of the speaker’s speech can be extracted by the residual fusion structure from all the outputs of the parallel convolutional layers.Then,the statistic feature can be extracted by the statistics pooling layer from the fusion feature of the speaker’s speech,and employed as the input of the fully connected layer and the Softmax layer for speaker recognition.The experimental results show that the proposed speaker recognition system based on the multi-resolution frequency features and the PRNN is effective for speaker recognition.2.A speaker recognition method based on multi-resolution frequency features and the fusion gate parallel convolutional neural network(FG-PCNN)is proposed.Two frequency features of different resolutions can be extracted by two Mel filter banks from a speaker’s speech,and employed as the inputs of the first parallel convolutional layer of the FG-PCNN,respectively.For each parallel convolutional layer of the FGPCNN,the two gating weight matrices can be extracted by the fusion gate mechanism from the two input features of the parallel convolutional layer,and the two gating weight matrices are weighted on the two output features of the parallel convolutional layer,respectively.Then,the fusion feature can be extracted by a convolutional neural network from the outputs of the last parallel convolutional layer of the FG-PCNN.Finally,the fusion feature of the speaker’s speech can be employed as the input of the classifier of the FG-PCNN for speaker recognition.The experimental results show that the proposed speaker recognition system based on the multi-resolution frequency features and the FG-PCNN are effective for speaker recognition.And the fusion gate mechanism can be used to improve the performance of the speaker recognition system based on the parallel neural network.
Keywords/Search Tags:Speaker recognition, Multi-resolution frequency feature, Parallel neural network, Residual fusion structure, Fusion gate mechanism
PDF Full Text Request
Related items