Font Size: a A A

Robust Speech Feature Extraction Methods Based On Computational Auditory Perception And Tensor Models

Posted on:2011-12-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q WuFull Text:PDF
GTID:1118330338983871Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Computational auditory perception is an important research field of neuroscience,which aims to simulate the auditory processing mechanism of human being and investi-gate new auditory-based information processing methods and systems. Research on com-putational auditory perception model has important theoretical significance and applicationvalue for new speech processing techniques such as recovery of functional hearing, large-scale automatic speech recognition, speaker recognition, and human-computer interaction.Based on the computational auditory perception principle, this paper is devoted to somekey issues in speech signal processing, including auditory neural processing mechanism,feature extraction algorithms and recognition system with emphasizing particularly on robustspeech feature extraction in complex environment.The main contributions of this paper are given as follows:1. We propose a new method called Nonnegative Tensor Principal Component Analy-sis(NTPCA) and calculate the projection matrices of different modes under tensorstructure. Based on the STRF model in primary auditory cortex, we introduce ahigher order tensor model with four modes (time, frequency, scale and phase) andunify the temporal and spectral characteristic of speech signal into a cortical represen-tation model. NTPCA algorithm is employed to extract the speech feature. A newrobust speech feature called Gabor Tensor Cepstral Coefficients (GTCC) is proposed.The sparse constraints of algorithm make to preserve the clean speech componentswith sparse distribution and suppress the noisy components with dense distribution.The experimental results prove that GTCC feature is efficient and robust to improvethe performance of speech recognition system under noisy environment.2. We explore the signal processing mechanism of peripheral auditory pathway and sim-ulate the frequency selectivity of basilar membrane in cochlear by a bank of cochlearfilters to extract the feature of cochlear energy spectrum. We employ Independent Subspace Analysis(ISA) to project cochlear energy spectrum into statistical indepen-dent linear subspaces to extract the harmonic components of different speakers. Thenoisy components can be reduced by the maximizing the independence of differentsubspaces. The experimental results verify that our proposed method is robust againstnoise and improves the performance of speaker recognition system under noisy envi-ronment.3. A new tensor factorization method called Constraint Nonnegative Tensor Factoriza-tion(cNTF) is developed. By sparseness control operator and orthogonal constraint,we control the sparseness of tensor basis functions and feature coefficients and extractlocal representation of speech signal. Combined with cortical representation model,robust sparse Gabor features for speaker recognition are obtained after projection bysparse tensor basis functions learned by cNTF algorithm. From the experimental re-sults, Cortical Tensor Cepstral Coefficients (CTCC) features are proved to be robustagainst additive noise and fit to various noisy conditions.4. We propose a new tensor model with three modes (time, frequency and speaker iden-tity) to improve the robustness of speaker recognition in complex conditions. Sparsenonnegative tensor factorization algorithm is employed to learn the tensor basis func-tions which preserve the discriminative information of different speaker identities. Theoptimal features are obtained after projection by the tensor basis functions. Exper-imental results prove that Auditory-based Nonnegative Tensor Feature (ANTF) andAuditory-based Nonnegative Tensor Cepstral Coefficients (ANTCC) improve the per-formance of speaker recognition system and robustness of speech features.In summary, this paper investigates the robust feature extraction problem of speechsignal in noisy environments. We explore the signal processing mechanisms in peripheralauditory pathway and auditory cortex and propose several robust feature extraction methodsfor speech signal based on auditory perception model and higher order tensor factorization.Our proposed methods improve the performance and robustness of current speech recogni-tion and speaker recognition systems.
Keywords/Search Tags:Feature Extraction, Auditory Perception, Speech Signal Processing, Speech Recognition, Speaker Recognition, Tensor Factorization, Principal Component Anal-ysis, Nonnegative Matrix Factorization, Independent Component Analysis
PDF Full Text Request
Related items