Font Size: a A A

Research On The Key Technology Of Big Audio Retrieval

Posted on:2019-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:S S YaoFull Text:PDF
GTID:1368330596482312Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Audio retrieval is among the core technologies for the management of audio data,and is widely used in tasks such as music identification,advertising monitoring,copyright protection,etc.Traditional techniques for audio retrieval focus on improving the precision and recall rate of retrieval,by developing appropriate audio fingerprints and indexes for fingerprints.With the advent of the era of big data,big audio data is not only high in dimension,but also huge in quantity.At the same time,the quest for faster speed or higher performance has never been stop with the progress of technology.The efficiency has,therefore,become the primary focus of audio retrieval.There are two approaches to improve the efficiency for big audio retrieval according to the characteristics of high dimension and large volume of big audio data.Firstly,data reduction techniques can be used to address the high dimensionality of big audio data,by which the data volume of fingerprints can be reduced,thus the computational complexity of subsequent retrieval and matching processes can be simplified,and,as a result,significant acceleration of audio retrieval can be achieved.However,by using this approach,not only the dimension of fingerprints is still high,but also the precision and recall rate of retrieval are compromised to some extent.Significant research efforts were focused on simplifying the extraction processes of audio fingerprints,while little attentions has been putting on further reducing the data volume of fingerprints by applying big data techniques.Secondly,data filtering techniques can be applied to address the large volume of big audio data,by which a large number of irrelevant audio can be quickly eliminated to the advantage of reducing the number of fingerprint to be matched.However,the retrieval precision and recall rate with the approach highly depend on the robustness of the selected fingerprints.The existing research efforts primarily focus on filtering by indexing,but the candidate set obtained by indexing is still too large for big audio data.Based on the research review of big audio retrieval,this dissertation respectively studies the methods and techniques for high-dimensional data reduction,large volume data filtering,efficient retrieval strategies and fingerprint matching to achieve efficient big audio retrieval.The studies take audio data management as the primary means,and tries to choose an optimal combination of an audio fingerprint and a retrieval strategy by developing innovative methods and techniques for organizing and processing fingerprint data,and combining data reduction and data filtering,so as to reduce the memory footprint and achieve efficient audio retrieval,while keeping the retrieval precision and recall rate.The main contributions and innovations of this dissertation are as follows:(1)Two methods for high-dimensional data reduction are proposedA method that generates a coarse-grained middle fingerprint is proposed based on dimension reduction and Bag-of-Feature(BoF).And a sampling method featuring cross-interval random sampling is proposed.Both of the methods for high-dimensional data reduction can generate a set of reduced fingerprints with much less data,and reduce the amount of the data to be processed by orders of magnitude.(2)Two multi-stage filtering techniques are proposed for big audio dataMultiple filtering techniques,including middle fingerprint filtering,fingerprint interval threshold filtering,counting and sort filtering with a dynamic threshold,are proposed,which are combined with Fibonacci index filtering to enhance the filtering ability.Two multi-stage filtering combinations are proposed,a three-stage filtering with dimensionality reduction as primary means consisting of Fibonacci index filtering,middle fingerprint filtering,and fingerprint interval threshold filtering;and another three-stage filtering with sampling as the primary means consisting of Fibonacci index sampling filtering,count and sorting filtering with a dynamic threshold,and fingerprint interval threshold filtering.The two combinations can both quickly eliminate a large number of irrelevant audio,significantly reduce the number of candidates to be matched,and thus increase the retrieval speed by several orders of magnitude.(3)Two efficient retrieval methods are proposedAn efficient cascaded filtering-and-verifying retrieval(CFR)method is proposed by combining dimensionality reduction and multi-stage filtering.The retrieval speed is nearly 70 times faster than that of the best matching method in the experiments,while maintaining the accuracy and recall rate.A sampling and counting retrieval(SC)method is proposed by combining sampling and multi-stage filtering.SC improves on CFR and solves the problem that CFR cannot retrieve audio clips of less than 6 seconds in length.The retrieval speed is 27 times faster than that of CFR on average,and the hash table is reconstructed by removing the middle fingerprint database and only recording the ID of the audio corresponding to the sub-fingerprint,as a result,saving about 50% of the memory.(4)A fingerprint matching technique with time-stretch resistance is proposedBy mining the time correlation in Philips fingerprints,and carrying out fingerprint matching properly,a fingerprint matching method called the turning points alignment method with threshold is proposed and combined with SC,to realize a method called enhanced sampling and counting retrieval with time-stretch resistance(eSC),which solves the difficulty problem that Philips fingerprints cannot resist the time stretching,and realizes the optimized combination of audio fingerprinting and retrieval strategy.The method can help Philips fingerprint resist time-stretch from 70% to 130%,which is equivalent to the best time-stretch resistance fingerprint,Quads,also improves on the retrieval performance with other noises and distortions,and can be extended to any Philips-like fingerprint retrieval system to enhance the ability of time-stretch resistance.
Keywords/Search Tags:audio retrieval, big audio data management, data reduction, data filtering, audio matching
PDF Full Text Request
Related items