Font Size: a A A

Extracting Features In Spatial Hearing And Reproduction Of 3d Audio

Posted on:2012-06-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q TangFull Text:PDF
GTID:1118330335481748Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the progress of multimedia and immersive media, the demand for audio quantity is much higher than before. In comparision to video technique in 3DTV which can display three dimensional images, the study on how to render three dimensional audio is quite lag. Traditional audio systems like stereo and multi-channels surrounding system, which can't meet the progress of future TV, share the same shortcoming: no precise source positioning, limited sweet spot representing area of best spatial impression and immersion, and fixed sound image. In practical applicaton, due to inconformity with vision and audition, it can lead to fatigue easily. In order to reproducte 3D audio using linear loudspeakers array in 3DTV, we mainly study on acquisition of 3D audio, features extraction in spatial hearing, direction of sound beam and distance changing of sound image in this dissertation. The primary contributions of the dissertation are classified into the following categories:1. Considering acquisition of 3D audio with high resolution, spherical microphones array with equi-distance is designed and applied to localizaton of sound source in three dimensions using localization algorithm based on spherical Fourier transform. Through constructing loss function, array with equi-distance is designed. According to sound principle, the recorded sound field is decomposed into weighted linear sum of spherical harmonies in spherical harmony domain whose harmonies have orthogonality property by spherical Fourier transform. Then amplitude density of sound sources are transformed by inverse spherical Fourier transform and localizations of sound sources are obtained.2. In spatial hearing, we proposed methods on head related transfer function (HRTF) extracting features based on nonlinear manifold learning algorithm—locally linear embeded (LLE) and personalizing HRTF based on non-negative matrix factorization (NMF) -support vector regession (SVR). According to the principle of knowing object in brain, which is from simple to complex and from local to whole, database of HRTF is reduced dimension and clustered. Representative HRTFs are extracted. All HRTFs can be interpolated using representative HRTFs. Meanwhile, HRTF is personalized function related to anthropometry parameters. Independent components are extracted using NMF to act as output of training samples and anthropometry parameters with large correlation are selected as input of training samples. A regression model is derived. Personalized HRTFs are calculated through the regression model when new anthropometry parameters input.3. As for rendering binaural signal using loudspeakers, a method which can reduce eigen-value diffusion coefficient (EDC) by diagonal loading identity matix and promote the system's robustness is proposed. Firstly, the principle of cross cancelation is introduced. Then the model is extended from two listening points and two loudspeakers to multi-listening points and multi-loudspeakers. Through diagonal loading method, EDC of cross cancelation is reduced and the rubustness of reproduction system is promoted4. As for reproduction of 3D audio, methods on controlling sound'direction and sound image'distance are proposed using linear loudspeakers array. In order to integrate into 3DTV system and simplify wave field synthesis (WFS) technique, linear loudspeakers array is used to reproduce sound field of 3D audio. For direction of sound, based on analyzing relationship between beam with the number and inter-space of loudspeakers, multi-beam system composed of beam unit is realized in linear loudspeakers array, which is compatible for multi-channels surrounding system. For sound image, we design two dimensional FIR filter to control distance of sound image. Time delay arrived at different wavefront is serve as group time delay of filter. In consideration of broadband signal, beams are formed at desired width by designing wedge-shaped transition band. The distance of virtual sound image is closer to listeners and listeners have the feeling of immersion.
Keywords/Search Tags:Three dimensional audio, Spherical microphones array, Spatial hearing, Head related transfer function, Linear loudspeakers array
PDF Full Text Request
Related items