Font Size: a A A

Multi-person Speech Separation Method Based On Computational Auditory Scene Analysis

Posted on:2019-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:K L WangFull Text:PDF
GTID:2438330551456333Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Single-channel speech separation(SCSS)refers to the process of extracting and restoring the original pure speech from the inputed single-channel mixed speech signal without sufficient prior information.The ear has a strong ability of speech separation,it inspired the emergence of CASA-based single-channel speech separation(CASA-based SCSS)which has become an important branch of research in the field.Based on the theory of computational auditory scene analysis,this paper studies the separation of single-channel multi-speaker mixed speech.The specific contents are as follows:(1)The speech feature analysis is carried out.According to the short-term stability of the speech,it is transformed into the frequency domain by using the Fourier transform,the speech spectrum is characterized by a spectrogram,and the speech cepstrum is characterized by a similar pitch spectrum;(2)A more precise detection of pitch period was carried out.Based on the continuous pitch period traces presented on the pitch spectrum,the pitch period was estimated more accurately,which provided the basis for the subsequent separation.(3)The influence of different types of noise on speech spectrum and cepstrum under different signal-to-noise ratios is contrasted and analyzed,and a method of spectrum harmonic localization at low signal-to-noise ratio is studied.(4)The method of signal-noise separation is studied,and the speech segments from the same speaker are separated by cues of pitch cycle.Based on the pitch period,the position of each harmonic of speech is obtained.The comb spectrum is used to extract the harmonic spectrum,and the speech is reconstructed by inverse Fourier transform.(5)The multi-person mixed speech separation method is studied,mixed Gaussian model is used to identify the speaker,the separated speech segments are matched according to the speaker recognition results,and the separated speech from the same speaker is time-series combined to achieve the separation of many people's mixed voice;Experiments show that the method proposed in this paper can get good results in eliminating many kinds of typical noise interference.For two-person simultaneous speech,the speech of each speaker can be separated and the sound quality is better.
Keywords/Search Tags:Speech separation, Computed auditory scene analysis, Single channel, Multi-speaker, Pitch period, Separation of signal and noise
PDF Full Text Request
Related items