Font Size: a A A

Application Of Blind Speech Separation Technology

Posted on:2015-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:J M WangFull Text:PDF
GTID:2308330473953170Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
We are surrounded by sounds. Such a noisy environment makes it difficult to obtain desired speech and it is difficult to converse comfortably there.This makes it important to be able to separation and extract a target speech signal from noisy observations for both man-machine and human-human communication.So BSS is a very vital applied technology for people’s daily lives.The use of BSS in the development of comfortable acoustic communication channels between human and machines is widely accepted.Blind source separation(BSS) is an approach for estimating source signals using only information about their mixtures observed in each input channel.The estimation is performed without possessing information on each source,such as its frequency characteristics and location,or on how the sources are mixed. This thesis focuses on blind speech separation with linear convolutive mixture when the underlying system is overdetermined( m ?n)and linear linear mixture when the underlying system is underdetermined( m ?n), respectively.The specific research works are as follows:1.Firstly,this paper focuses on the blind separation of n sources from m linear convolutive mixtures when the underlying system is overdetermined. Specific steps as follows: Firstly,each of the time-domain microphone observations is converted into frequency-domain time-series signals by a short-time Fourier transform(STFT).Secondly,observations in frequency can separated by Fast ICA algorithm.Then with permutation,scaling,T-F masking and inverse STFT,we can get estimation of the source signals.2.Secondly,this paper focuses on the blind separation of n sources from m mixtures when the underlying system is underdetermined,as applied to mixtures with only attenuations and delays(i.e., no reverberation). The separation is realized in the frequency domain where, at least for mixed speech, the representation is sparse. The approach is experimentally illustrated for m= 2 sensors and n=3 speech.The procedure is organized in three stages. Firstly, the matrix of relative attenuations is inferred by angular clustering of the magnitude of the input with potential function, yielding a coarse partition of the data into their nearest sources. Secondly, for each partition, the differential delay is inferred by shifting the sensor channels and scattering the real and imaginary components until the cluster reappears. And thirdly, given the attenuation the magnitude of the spectral coefficients is Laplacian distributed, which leads to the minimization of the sum of magnitudes, subject to the mixing equations. The resulting problem is an instance of second-order cone programming.
Keywords/Search Tags:Blind source separation, linear mixture, convolutive mixture, clustering
PDF Full Text Request
Related items