Font Size: a A A

Research On Auditory Sound Localization And Coding Technology

Posted on:2012-11-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:1228330395458620Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
3D film and TV create immersive visual enviroment to audiences, but the auditory experiences provided by the traditional audio technology still behind the3D visual experiences. For3D audio technology, its development and3D video technology is unequal. In the traditional multi-media technology, our development are behind the western countries for many years in the "following" state. However, in the nascent field of3D audio technology, we will have the opportunity with the world’s developed countries to compete on the same starting line, thus achieving "follow" to "leding" by leaps and bounds. There are many problems needs to be soloved in3D audio technology:1) The latest3D audio technology to extract sound field from the plane, such as wind and rain to produce a special space for the virtual three-dimensional sound sound. As the three-dimensional information of thesound source is not trully extracted, it can not reproduce the true three-dimensional sound field.2) As the frequencies of spectral notches and peaks are prominent cues in Human elevation perception,3D audio VBAP technology positions virtual sources only by the gains of amplitudes of channels in3D, so the panned angles differ with the perceived directions.3) As the traditional audio events in horizontal extent are extended vertically to create the illusion of audio events placed anywhere in3D space, the vast and complicated3D audio data demand more efficient compression.The dissertation’s work is under support of National Scientific Important Project "Research On Key Techonologies of Future Mobile Multimedia Video/Audio Codec"(No.2010ZX03004-003), National Science Foundation of China "Research on Key Theory and Technology of Mobile Audio Coding"(No.60832002) and Ph.D candidates self-research program of Wuhan university in2010"Research on Spatial Cues-based Security Monitoring Sound Localization and Separation Technology"(No.20102110101000099), does some research about spatial information acquirement and massive spatial information compressing. The dissertation research the sound source horizontal localization technology based on spatral cues which can provide theoretical support for spatial information acquirement, the sound source vertical localization technology based on spectral cues which can provide theoretical support for spatial information acquirement and3D sound scene synthesis, and the spatral audio prediction coding technology based on inter-frame difference distribution of spatial cues which provide theoretical support for massive spatial information compressing.The main contributions of the dissertation are listed as follows:(1) Moving Sound Source Horizontal Localization Model by Doppler Effect Removing. Spatial cues ITD and ILD which provide sound localization information play a very important role in binaural localization system. The binaural sources location method is usually used to estimate the position of a sound source by a similar probabilistic model, which separately measure ITDs and ILDs for a number of frequencies and process these measurements as a whole. This can leads to a very high accuracy with static sound source by combining ITDs and ILDs perfectly. However, Doppler effect which describe the change in received frequency of moving sound source should be considered because of the high correlations between binaural cues and frequency. Efficient moving sound source horizontal localization model by joint estimation of ITD and ILD based on Doppler effect is investigated. By removing Doppler effect influence, results show that the proposed moving sound source horizontal localization model achieves0.3%(velocity=1m/s),5.7%(velocity=5m/s) and10.5%(velocity=lOm/s) accuracy improvement in silent conditions. The performance of our method will be more effective as sound moves faster.(2) Sound Source Vertical Localization Model Based on Spectral Cues. The spectral cue-based localization for sound source is based on the statistical distribution between spectral cues and elevation. A serious problem for spectral cue-based localization is that spectral cues are highly depend on the source signal, so the same statistical distribution can not apply to different sound source. Efficient sound source vertical localization model based on spectral cues is investigated. Respectively, for the noise, speech and music signals we choose different key features(peaks and notches) that can detemine the vertical position of the sound source, and get the statistical distribution between the key features and Elevations, thus the Elevation of the sound source can be estimated by the key features and the statistical distribution. Results show that the proposed sound source vertical localization model achieves2.3%(for noise),6.6%(for speech) and16.4%(for audio) accuracy improvement in silent conditions.(3) Higher-order Prediction Model of Spatial Cues Based on The Inter-frame Difference Distribution of Spatial Cues. To minimize bit-rates, the spatial cue side information must be efficiently quantised for transmission. In EAAC+and MPEG Surround, the intra-frame differential coding method is proposed by removing the redundancy remaining in the spatial cues of adjacent sub bands. While inter-frame differential coding method is proposed by removing the redundancy of spatial cues between two adjacent frames for the purpose to get more quantization gains. However, these two methods can be improved because they do not consider the inter-frame difference distribution of spatial cues. Efficient compression of spatial cues based on the inter-frame difference distribution of spatial cues is investigated. Using a Bayesian Gradient model, the inter-frame correlations can be predicted more accurately. Results show that the proposed higher-order prediction method for spatial cue compression achieves about20%bit-rate reduction with respect to the inter-freq differential coding method used in MPEG Surround.In conclusion, the work of this dissertation exploring a moving sound source horizontal localization model by doppler effect removing, a sound source horizontal localization model based on spatral cues and a higher-order prediction model of spatial cues based on the inter-frame difference distribution of spatial cues. These model provide important theoretical significance and application value for spatial information acquirement and massive spatial information compressing. Finally we conclude the contributions of this paper and points out the future work.
Keywords/Search Tags:3D audio, Spatral cues, Doppler effect, spectral cues, Spatral audio, Prediction coding
PDF Full Text Request
Related items