Font Size: a A A

Research On Key Technologies In Multichannel Speech Signal Processing

Posted on:2011-12-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WangFull Text:PDF
GTID:1118360305455646Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In cocktail party where many people are talking concurrently, a person always finds no difficulty in communication with someone specificly. This cocktail party phenomenon results from the special structure of human ears, along with the complicated processing style of human brain. Based on the above princile of human auditory system, two key technologies in multichannel signal processing:sound field reconstruction and speech separation are developed. The thesis fully investigates the two techniques, attempting to make multichannel signal processing serve better for human hearing.According to the style in which human ears locate sound signals,3D audio (sound field reconstruction) generates perception of real three dimensional sound fields by emulating signals received by two ears. The transmission of sound from some spatial point to human ears can be regarded as a linear filtering process, whose transfer function is termed as head-related transfer function (HRTF). HRTF provides essential information for human ears to localize the sound. Convolving sound source with HRTFs and playing via headphone, it gives listeners the perception that the sound is coming from the desired spatial point. This is the theory foundation of 3D audio.3D audio technology has a wide potential application in multimedia and virtual reality, human-machine interaction, family entertainment, psychoacoustics etc., due to its advantages of simple structure and natural perception. The research on 3D audio in this thesis is as follows.(1) Binaural synthesis using generalized HRTFs may generate large localization errors because HRTF differs greatly for individuals. A subjection comparison method is proposed to selects a near individualized HRTF set from a large number of non-individualized HRTF sets, and thus reduce localization errors.(2) Common-acoustical-poles/zeros (CAPZ) approximation is an efficient way to model HRTF. A novel method to estimate CAPZ model parameters by minimizing log-magnitude error is presented, which may better fit human auditory perception than conventional methods. An "out of head" stereo enhancement system for headphone is designed to resolve the "in head" localization, based on CAPZA models and simplified artificial reverberation algorithm.(3) A hybrid compression method is proposed to resolve the storage problem of a large number of HRTFs. The proposed method jointly employs principal component analysis, vector quantization, and curved surface fitting to compress the HRTF data. The proposed method may reduce the data size greatly with similar reconstruction precision comparing with the principal component analysis method.(4) HRTF interpolation is necessary for binaural synthesis when dealing with source or head movement problems. An all-zero interpolation method is proposed based on principal components analysis. The spatial variation of principal component weights is polynomial fitted with a bivariate function of two spatial angulus (azimuth and elevation). And a sphere-partitioning optimization scheme is employed to improve the approximation precision. An indirect interpolation method is proposed for pole-zero models, based on the mature all-zero interpolation method.(5) The crosstalk cancellation problem is investigated for stereo loudspeakers system. To reduce the computation cost, a stereo crosstalk cancellation algorithm is presented based on common-acoustical pole/zero (CAPZ) model. In the proposed method, electroacoustic transfer paths from the loudspeakers to ears are approximated with CAPZ models first, then the crosstalk cancellation filter is designed based on the CAPZ transfer functions. The proposed method may reduce computation cost greatly comparing to conventional methods.Blind source separatin is a new technique in array signal processing and statistical analsis. It attempts to recover the indpdent sources from observed mixtures, with priori knowledges about sources and mixing processing unknown. In speech blind separation, the key problem is to segregate the speech of someone specificly from the multichannel speech mixtures. The challenge in real-world application is that the sources are convolved with room responses, i.e. convolved blind separation. The research on speech blind source separation in this thesis is as follows.(1) Frequency-domain convolved blind source separation is investigated, and a new alignment method based on an inter-frequency dependence measure, the powers of separated signals, is presented. The region-growing style permutation alignment scheme can solve the permutation ambiguity problem efficiently with small computation.(2) To improve the separation performance in reverberant environments, a combined method which integrates the advantages of blind source separation and beamforming is proposed. With beamforming reducing reverberation and enhancing signal-to-noise ratio, the combined method, which use beamforming as a preprocessing step of blind source separation, performs better in high reververation. Moreover, emulating the principle of human auditory system, the combined method is extended to the extraction of target speech in noisy cocktail-party environments.
Keywords/Search Tags:3D audio, Head-related transfer function, Common-acoustical pole/zero model, Data compression, Interpolation, Crosstalk cancellation, Blind Source Separation, Permutation, Beamforming
PDF Full Text Request
Related items