Research On Speaker Separation And Recognition Of Conference Voice Based On Voiceprint Recognition

Posted on:2021-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:K L Chen

Full Text:PDF

GTID:2428330647463633

Subject:Electronic and communication engineering

Abstract/Summary:

The multi-person conversation voice contains important information such as character identity,speaking venue,role relationship,etc.By separating the role of this type of speech,it can recognize when and what different characters said in the conversation,improve the accuracy of speech recognition and the use of speech information rate.With the development of the Internet and mobile technologies,the use of speech recognition in daily life has increased.The separation of voice roles has attracted the attention of scholars at home and abroad.Research from separation models and recognition equipment is used to improve the accuracy of separation.For various scenarios There are still some problems with role separation.As a typical scene of multi-person conversation,without prior information,the conference distinguishes the voice segments of different roles in the conference,and separates the overlapping voice segments of multiple roles.It is the research focus of this paper to improve the accuracy of the separation of multi-role mixed voices.This article takes the speech signal in the conference scene as the research object,compares and analyzes the speech separation methods used in different scenes,and proposes a new method that is more efficient on the basis of the original multi-role mixed speech separation,so as to achieve the correct role separation Research goals.According to the particularity of the research object,the continuous speech is first segmented,then the segmented overlapping speech is processed secondly,and then the segmented minimum character speech segments are clustered,and finally the parameter optimization output simulation is performed according to the separation method proposed in this paper Results and data of separation performance evaluation indicators,specific research contents are as follows:First,combining the principle of the sound sensing mechanism of the human hearing system and the analysis of the characteristics of the spectrogram,the speech spectrogram in the conference scene was simulated in different situations,and the results were analyzed to find that the spectrogram has a class Similar and different features can be applied to the separation of multi-role voice in conference scenarios.Second,by comparing typical multi-role speech segmentation techniques,a hashsimilarity speech segmentation method based on voiceprint recognition is proposed.The texture of the spectrograms of different characters has certain differences in the distribution and direction of the time-frequency structure,and there is also a certain distinction in the description of the peaks and transients of the spectrum.At the same time,the spectrograms of the same characters have a short time Stable characteristics.Based on the similar characteristics of the spectrograms between the same characters and the different characteristics between different characters,the hash similarity fusion algorithm based on voiceprint recognition is first used to texture encode each frame of data,and the detection point similarity judgment is added to the encoding.This makes it possible to judge multi-role speech conversion points with a certain accuracy when decoding.Third,the separation processing method of multi-role overlapping speech segments adopts a feature fusion model based on voiceprint recognition and Mel frequency cepstrum coefficients.Mel frequency cepstrum can simulate the human ear auditory system.Cepstrum is proposed in the frequency domain of the Mel scale Parameters,and then combined with the formants in the voiceprint and pitch period characteristic parameters to model,separate the single-track mixed speech segment into multiple single-track speech segments divided by single role,and separate the speech of the key character from the overlapping speech segment Information,and the human ear has good hearing.Fourth,clustering multi-role speech segments around the smallest segment of segmented speech,using a combination of two algorithms in the support vector machine to complete the classification by role,in order to find the best hyperplane that can well divide the multi-role The optimal solution is to select the best support vector machine kernel function based on speaker features to extract features and map them to a high-dimensional feature space,so as to achieve the clustering of speech segments of different roles.Experimental results show that the algorithm is effective and accurate in clustering problems of non-linear decision surfaces such as different speech segments.In summary,this paper takes multi-role mixed speech as the research object in the conference scene,and conducts in-depth research on multi-role speech segmentation,secondary segmentation of overlapping speech segments,clustering of speech character speech segments,etc.,and speech separation in complex scenarios Some useful research results have been achieved,which laid the foundation for the improvement of speech recognition and the better application of speech in complex scenes.

Keywords/Search Tags:

Voiceprint recognition, Speaker separation, Mel frequency cepstrum, Hashing, Support vector machine

Related items

1	The Research Of Speaker Recognition Method Based On Cepstrum Features
2	Research On Voiceprint Recognition System And Pattern Recognition Algorithm
3	The Algorithm Of Speaker Identification In Noisy Environment Base On GMM/SVM
4	Speaker Recognition Based On Support Vector Machine
5	Text-Dependent Speaker Recognition Algorithm Study Based On Improved VQ
6	Research On Speaker Recognition Technology Based On Voiceprint Information Space
7	Speaker Recognition Based On Swarm Intelligence And Blind Source Separation
8	Research Of Speaker Recognition Based On Support Vector Data Description
9	Research On Systems For Voiceprint Recognition Based On Vector Quantization And Neural Network
10	Support Vector Machine Applications In Speaker Recognition