Font Size: a A A

Monaural Speech Segregation Based On Computational Auditory Scene Analysis

Posted on:2016-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:R J LiFull Text:PDF
GTID:2308330470957909Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In nature environments, a typical auditory scene contains acoustic noise such as environmental noise, music or another voice. Noise distorts target speech and poses a substantial difficulty to many applications of speech technology. Monaural speech segregation refers to the problem of segregating target speech from mixed speech signal with a recording. Computational auditory scene analysis (CAS A) is a new method which can be used to complete the task of monaural speech segregation, as well as a hot research issue in the field of speech signal processing.This dissertation mainly studies the problem which is about monaural speech segregation based on CASA. We carry out research on auditory segregation, high frequency time-frequency (T-F) units labeling and mask smoothing after auditory grouping, based on which, we propose several new approaches. Our system is systematically compared with the pervious system. The main contents of this dissertation are shown as follows:(1) To improve the accuracy of labeling high frequency units and auditory segregation, an improved grouping approach for monaural voiced speech segregation is proposed. During the grouping stage, the method first employs different features to label time-frequency units in low and high frequencies. And enhanced envelope autocorrelation function (EEACF) is employed to label the units in the high frequency range. Second, onset/offset analysis is employed to produce auditory segments, by which target speech and interference are well separated into different segments. The segments are alternatively grouped using the complementary binary masks of segregated voiced speech. Systematic evaluation shows that the algorithm performs better than previous system.(2) Speech segregation based on CASA is actually to estimate binary mask. The obtained mask is employed to synthesize target speech waveform from the mixture. Since the accuracy of extracted auditory cues is reduced by noise interference, the estimated mask usually contains many small noise segments and broken target auditory segments, which will degrade the performance of segregated speech. Combined with morphological binary image processing, we propose an improved approach for speech segregation based on mask smoothing. First, to reduce the isolated segments of auditory steams, the method smooth the mask by opening. Next, to obtain the missing T-F units of target segments, the mask is smoothed by closing. Systematic evaluation shows that the algorithm yield a better performance for synthesized target speech.
Keywords/Search Tags:CASA, Monaural Speech Segregation, Onset/offset, EEACF, MaskSmoothing
PDF Full Text Request
Related items