Speaker Recognition Based On Multi-Resolution Frequency Features And Parallel Neural Network

Posted on:2023-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:J C Zhang

Full Text:PDF

GTID:2568306800952409

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

It is crucial to extract effective acoustic features of a speaker’s speech to improve the performance of a speaker recognition system.Generally,the frequency domain features can be extracted by the Mel filter bank from a speaker’s speech.The frequency features of different resolutions can be extracted from a speaker’s speech by several Mel filter banks,where the numbers of the triangular filters in the Mel filter banks are different.There are some complementarities between the frequency features of different resolutions.In this thesis,we study the speaker recognition methods based on the multi-resolution frequency features and the parallel neural networks.The main work of the thesis is as follows:1.A speaker recognition method based on the multi-resolution frequency features and the parallel residual neural network(PRNN)is proposed.Two frequency features of different resolutions can be extracted by two Mel filter banks from a speaker’s speech,and employed as the inputs of the parallel convolutional layers of the PRNN,respectively.The fusion feature of the speaker’s speech can be extracted by the residual fusion structure from all the outputs of the parallel convolutional layers.Then,the statistic feature can be extracted by the statistics pooling layer from the fusion feature of the speaker’s speech,and employed as the input of the fully connected layer and the Softmax layer for speaker recognition.The experimental results show that the proposed speaker recognition system based on the multi-resolution frequency features and the PRNN is effective for speaker recognition.2.A speaker recognition method based on multi-resolution frequency features and the fusion gate parallel convolutional neural network(FG-PCNN)is proposed.Two frequency features of different resolutions can be extracted by two Mel filter banks from a speaker’s speech,and employed as the inputs of the first parallel convolutional layer of the FG-PCNN,respectively.For each parallel convolutional layer of the FGPCNN,the two gating weight matrices can be extracted by the fusion gate mechanism from the two input features of the parallel convolutional layer,and the two gating weight matrices are weighted on the two output features of the parallel convolutional layer,respectively.Then,the fusion feature can be extracted by a convolutional neural network from the outputs of the last parallel convolutional layer of the FG-PCNN.Finally,the fusion feature of the speaker’s speech can be employed as the input of the classifier of the FG-PCNN for speaker recognition.The experimental results show that the proposed speaker recognition system based on the multi-resolution frequency features and the FG-PCNN are effective for speaker recognition.And the fusion gate mechanism can be used to improve the performance of the speaker recognition system based on the parallel neural network.

Keywords/Search Tags:

Speaker recognition, Multi-resolution frequency feature, Parallel neural network, Residual fusion structure, Fusion gate mechanism

PDF Full Text Request

Related items

1	Image Super-Resolution Reconstruction Based On Residual Fusion Neural Network
2	Research On Speaker Recognition Based On Acoustic Feature Enhancement And Multi-Scale Feature Fusion
3	Reserch On Single Image Superresolution Based On Mulyi-level Residual Attention Fusion Network
4	Research Of Speaker Recognition Technology Based On Fusion Features
5	Speaker Recognition Based On Multi-information Fusion
6	Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment
7	Study On Image Super-resolution Reconstruction Method Based On Dense Feature Fusion
8	Speaker Recognition Based On Fusion Features And Deep Neural Networks
9	Facial Expression Recognition Based On Multi-scaled Feature Fusion
10	Research On Text-independent Speaker Recognition Based On Attention Mechanism