Font Size: a A A

Audio Retrieval Resisting To Speed-Change

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:R J ChuFull Text:PDF
GTID:2518306113951619Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Audio retrieval identifies a short query snippet from a reference audio database for its detailed information,which has been employed as the core technology by numerous applications dealing with audio.Not only should the audio retrieval require high efficiency,but also should it achieve accurate retrieval under various distortions.Philips fingerprint(PF),one of the representative audio fingerprinting methods,could resist various types of distortions.There are many efficient retrieval methods have been proposed based on the Philips fingerprint.However,Philips fingerprint combined with efficient retrieval methods cannot form an ideal audio retrieval for the irresistibility of speed-change,which would affect the frequency and the playing speed of audio.While Philips fingerprints are extracted from a fixed energy bands on the spectrogram.Whenever there is a frequency change,the information within would change,causing the fingerprints to mismatch.The follow-up research focuses on improve the Philips fingerprint with robustness to speed-change by utilizing the scale-invariant information among the audio to extract fingerprint and yield an inferior performance.The audio retrieval with both high robustness and efficiency simultaneously could be achieved if the Philips fingerprint could be improved to resist speed-change while assuring the dedicated retrieval methods working.Enhanced Sampling and Counting method(e SC),based on the Philips fingerprint,is a state-of-the-art retrieval method with both robustness and efficiency and proposes that the time-stretch distortion could be resolved only if an appropriate retrieval strategy is employed.The distortion on audio imposed by time-stretch is similar with the change of playing speed affected by speed-change and presents with minor influence on the fingerprint.e SC proves that the distortion on playing speed is different from the frequency change which would cause the Philips fingerprint to mismatch.The frequency change caused by speed-change is the same as pitch-shift affecting audio frequency.Which means the frequency change caused by speed-change could be eliminated by fingerprint extraction process and the extracted fingerprints would present as a time-stretched one that could be retrieved by e SC.This thesis proposes a joint processing framework to deal with the speed-change distortion with two stages through analyzing the approach of e SC resisting time-stretch distortion by using the consecutive feature of Philips fingerprint.The frequency change caused by speed-change could be erased by fingerprint extraction methods which could resist pitch-shift distortion and the remaining influence on playing speed would form the fingerprints into time-stretched ones that can be retrieved by e SC.And the speed-change would be regarded as the combination of pitch-shift and time-stretch.To erase the influence of frequency change,this thesis proposes a peak-point based Philips fingerprinting method(PPF)by utilizing a dynamic energy bands computation method to erase the frequency change caused by pitch-shift and a method to maintain the robustness to time-stretch.The conducted experiments show that PPF can resist pitch-shift and time-stretch ranging from 70% to 130% and other distortions.More importantly,PPF combined with e SC under the joint processing framework can resist speed-change ranging from 70% to 130% and makes a breakthrough among all audio retrieval based on Philips-like fingerprints.Besides,the retrieval efficiency outperforms other related methods.The audio retrieval with both robustness and efficiency simultaneously could be achieved.
Keywords/Search Tags:Audio Retrieval, Audio Fingerprint, Pitch-shift, Speed-change
PDF Full Text Request
Related items