Font Size: a A A

The Research Of Monaural Speech Segregation Based On Computational Auditory Scene Analysis

Posted on:2014-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1268330425980895Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Monaural speech segregation system is able to extract the target speech from noisy environment in a single channel. It’s usually the front end of speech and speaker recognition. Speech segregation system based on Computational Auditory Scene Analysis can simulate human auditory system and extract the target speech by computer to accomplish monaural speech segregation. Since its processing of the mixture speech is similar to the human perception processing of sound, the topic has been one of the most hot research issues in speech segregation field in the recent years.This dissertation studies the CASA topic, introduces the structure and history of CASA-based speech segregation system and proposes an improved monaural speech segregation system. The main contributions of this dissertation are presented as follows:(1) We propose an improved threshold selection technique for energy extraction. Response energy is an important auditory feature for speech segregation. Conventional method uses a constant value for energy extraction. As the types of noise are various and unknown, the interferences of different types of noise will differ in each channel. Conventional threshold is not able to remove the noise units effectively, so this paper proposes an improved threshold selection method for each channel based on its average mixture speech energy. The proposed method can remove background intrusion effectively and yield a significant improvement in energy extraction.(2) We propose an improved iterative pitch tracking algorithm based on the estimated target source. Conventional pitch tracking algorithm doesn’t remove interference when detecting the target pitch, which will inevitably cause the errors of pitch estimates. The proposed pitch tracking algorithm only estimates the pitch periods based on the labeled target units. It first removes the interference units, computes the pitch periods of each frame and then labels the target units repeatedly based on estimated pitch contours. It estimates the target units and detects the pitch periods iteratively until the pitch contours become stable. The experiment results show that the proposed algorithm performs more robust and accurate than conventional pitch tracking method under various interferences environments.(3) We propose an improved method for unvoiced speech segregation. After voiced segregation, unvoiced speech needs to be extracted from residue noise. The proposed method extracts the unvoiced speech based on spectral subtraction. We estimate the noise energy in each unvoiced segments based on distance-weighted noise estimation algorithm. Then spectral subtraction is applied to extract and label the target unvoiced units. The proposed method performs better than conventional one while handling the time-varying noise situations. It improves the accuracy of noise estimation and yield a better performance for unvoiced speech segregation. (4) We introduce morphological image processing technique to improve the mask smoothing module. The mask obtained after grouping is used for speech resynthesis. As the mask usually contains residue noise particles and broken auditory segments due to the errors of pitch tracking and target units labeling, which will degrade the quality of the resynthesized speech, the proposed method based on morphological image processing is applied to solve this problem. It can remove the unwanted particles and complement the broken auditory elements while maintaining the original mask details through the effective combination of dilation and erosion processing, further enhancing the quality of segregated speech.
Keywords/Search Tags:Computational Auditory Scene Analysis, Speech Segregation, Energy Extraction, Pitch Tracking, Unvoiced Speech Segregation, Mask Smoothing
PDF Full Text Request
Related items