Font Size: a A A

Auditory Mechanism Based Robust Feature Extraction And Its Application In Speaker Recognition

Posted on:2014-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:D T YouFull Text:PDF
GTID:1268330392972593Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speech feature plays an important role in automatic speech recognition technolo-gy, its identifiability and distinguishability directly afects the performance of automaticspeech recognition. At the past decades, great progress has been achieved in automaticspeech recognition, and some representative speech features have been proposed, whichgreatly promoted the performance of automatic speech recognition. However, many s-tudies have found that the performance of automatic speech recognition was rather lowcompared with the human performance in non-stationary noise environments, especiallyunder low level signal-noise ratio (SNR) conditions. The robustness of speech feature isone of the fundamental reason. Research also showed that the simulation of the humanhearing mechanism helps to improve the robustness of speech feature, but this kind ofresearch work was not enough, the robustness of hearing mechanism has not been fullyexcavated, and further studies are needed in this area.Address to the problem of non-stationary noise, this paper focuses on the researchof auditory mechanism based robust feature extraction which is motivated by the strongrobustness of human hearing system, and proposes several robust features and all of themare evaluated in speaker recognition. The main works of this paper are as follows:(1) We propose a robust feature extraction method based on cochlear non-linearmechanism. Firstly, we analyse the Gammatone filter-bank and point out its lack in sig-nal processing, then we design a filter-bank which efciently reflect the signal processingmechanism of basilar membrane; after that, in view of the importance of the couplingmechanism between the basilar membrane and tectorial membrane, we design a frequen-cy selectivity gain function based on the coupling mechanism; after that, we propose afeature extraction method based on cochlear mechanism. In order to objectively evalu-ate the proposed feature, we also design a model-independent feature evaluation method.The experimental results show that the robustness of the proposed feature outperformsthe Mel Frequency Cepstral Coefcients (MFCC) and Perceptual Linear Predictive (PLP)features.(2) We propose a robust feature extraction method based on auditory cortex mecha-nism. Firstly, we give a description of the relationship between underlying structures of acoustic signal and the nerve representation of auditory cortex, and point out the extrac-tion method of underlying structures; after that, in order to analyse the efectiveness ofthe extracted underlying structures, we propose a validity criterion and an efectivenessmeasurement of the extracted underlying structures, and propose a corresponding opti-mization method; finally, based on the above-mentioned works, a robust feature extrac-tion method is proposed. Experimental results show that the performance of the proposedfeature is more robust than the MFCC and PLP features.(3) We propose a robust feature extraction method based on auditory source separa-tion mechanism. Firstly, based on the auditory source separation mechanism, we adoptthe underlying structures of speech and noise to approximate the auditory priori knowl-edge of speech and noise, and concatenate the underlying structures of speech and noiseto use as signal decomposition dictionary; after that, address to the distortion problem ofsource separation which is resulted in by the high mutual coherence between speech andnoise underlying structures, we propose a concatenated dictionary optimization methodand theoretically prove the convergence of the optimization method. Experiments showthat the optimization method helps to improve the efectiveness of concatenated dictio-nary based signal processing. Finally, a robust feature is proposed based on the above-mentioned works. Experiments show that the proposed speech feature not only outper-forms the MFCC and PLP features, but also is better than the above-mentioned auditorycortex mechanism based speech feature.(4) We propose a robust feature extraction framework based on auditory mechanism.The framework consists of two cascade layers, the two layers are sound source separa-tion mechanism based target speech abstraction layer and the cochlear non-linear pro-cessing mechanism based robust feature layer. According to the diferent types of targetspeech, the sound source separation mechanism based target speech abstraction layer canbe go further to subdivided into two steps: the sound source separation mechanism basedmixed source separation and voice activity detection (VAD); the former mainly extract thespeech signal from noisy speech, and the latter mainly extract the speech segments andstripping the non-speech segments from speech signal; for forward compatibility and toprovide efective speech segments, we also propose a sound source separation mechanismbased voice activity detection method. Experimental results show that the speech featurewhich is extracted from the framework is not only more robust than the MFCC and PLPfeatures, but also better than the above-mentioned three kinds of features; in addition, ex- perimental results of VAD show that the proposed VAD method outperforms the baselineVAD method.
Keywords/Search Tags:auditory mechanism, feature extraction, sparse representation, dictionary op-timization, robustness
PDF Full Text Request
Related items