Speaker recognition aims to identify the speaker’s identity with the features of the speakers’ speech signals.Speaker recognition is widely used in forensic identification,voice assistants,etc.,and it is a hot research area in speech signal processing.In this paper,the speakers’ speech feature fusion algorithms based on the independent vector analysis(IVA)and parallel convolutional neural networks are proposed for speaker recognition.The main work of this paper is as follows:1.A speech feature fusion algorithm based on IVA is proposed for speaker recognition.First,the time domain(TD)features and the frequency domain(FD)features are extracted from the speaker’s speech signal,respectively.A TD feature matrix and a FD feature matrix are formed with the TD features and the FD features of the speaker,respectively.A feature tensor can be obtained by paralleling the TD feature and the FD feature matrix.The independent feature component(IFC)matrix of the TD features and FD features are estimated by using the IVA,respectively.The fusion feature of the speaker’s speech is obtained by paralleling the IFC matrix of the TD and FD features.A speaker model can be obtained by using the IVA.Finally,the fusion feature of the speaker’s speech is used as the input of a deep convolutional neural network to extract the deep feature of the speaker’s speech.The deep feature of the speaker’s speech is utilized as the input of the fully connected(FC)layers,and the output of the FC layers is used as the input of the Softmax layer for speaker recognition.2.A speech feature fusion algorithm based on parallel convolutional neural network is proposed for speaker recognition.First,the IFC matrix of the TD and FD features of the speaker’s speech can be estimated from the speaker’s speech by using the IVA,respectively.Then,the IFC matrix of the TD features and the IFC matrix of the FD features are used as the input of the parallel convolutional neural network to extract the deep features of the TD and the FD features,respectively.The fusion feature of the speaker’s speech can be obtained by concatenating the deep features of the TD and the FD features.Finally,the fusion feature of the speaker’s speech is utilized as the input of the FC layers,and the output of the FC layers is used as the input of the Softmax layer for speaker recognition. |