Font Size: a A A

Distributed Implementation Of The Massive Audio Retrieval Algorithm

Posted on:2019-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y XinFull Text:PDF
GTID:2348330569479989Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Content-based audio retrieval technology is widely used in audio identification,digital audio content tamper identification,humming retrieval,broadcast monitoring and music recommendation.The robustness of the audio fingerprint and the efficiency of the retrieval algorithm directly affect the user's experience and are key factors for audio retrieval systems.Currently,audio fingerprint extraction algorithms and audio retrieval algorithms have achieved fruitful results in terms of robustness and accuracy.When applying these efficient algorithms to massive audio data sets,A standalone server cannot meet the needs for the required storage capacity,retrieval speed declines,and expansion is restrained.To address these problems of massive audio retrieval,a distributed implementation of the sampling-counting audio retrieval algorithm is proposed,The sampling-counting audio retrieval(SC)is one of the high efficient audio retrieval algorithm for a standalone server.A serialized Fibonacci hash table and a segmented implementation of the distributed index are employed to solve the key issues of distributed audio retrieval systems,the choice of the structure and the distribution of the fingerprint index.Using a serialized Fibonacci hash table structure can save storage space without slowing the search speed.The use of the Fibonacci hash function can reduce the number of hash buckets.The serialization of Fibonacci hash tables reduces the memory used by each hash bucket,and improves memory utilization.The distributed structure of grouped indexes uses S ?M = N data nodes to divide data nodes into S groups.Each group contains M data nodes.The distributed structure using local indexes between groups reduces the data volume of each group of indexes.In the group,a globally distributed structure is adopted.Each group's hash table is divided into M shares equally and distributed to M data nodes in the group.The distributed structure of the global index of the hash table is used to reduce the number of groups in the group.Data node retrieval task.When the index search is completed,the search results of the data nodes in the group corresponding to the audio data set need only be summarized,thereby reducing the communication cost of the cluster.The experimental results show that the distributed partition method of serialized Fibonacci hash table and grouped index is applied to the sampling-counting audio retrieval algorithm,which can effectively shorten the retrieval time and reduce the communication of the cluster while ensuring the accuracy and recall rate,improve memory utilization.
Keywords/Search Tags:Philips audio fingerprint, distributed audio retrieval, sampling-counting retrieval algorithm, fingerprint index structure, distributed structure
PDF Full Text Request
Related items