Font Size: a A A

Research On Storage And Search For Massive Data Of Audio Fingerprinting

Posted on:2015-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:R T WangFull Text:PDF
GTID:2298330452959581Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the coming age of big data, the world is producing data at an exponentiallyincreasing speed, especially multimedia data such as images, audios and videos. Howto effectively manage and make use of these data to providing more convenient is oneof the fundamental problems people need to solve in the information age. As thedevelopment of techniques in pattern recognition, machine learning and cloudcomputing, content-based multimedia search comes in. Compared with traditionalkeyword-based search, content-based search is independent of tags and keywords, andwith more accurate search results and more convenient search methods.As the important component of multimedia data, the data amount of audios alsoincreases fast. The key problem of people faces is no longer lacking of data, but howto find data they want in massive data. And how to retrieve audios from large-scaledatabases effectively and efficiently is a big challenge for both academia and industry.Audio fingerprinting technologies is one of content-based audio search methods.By extracting digital features called audio fingerprints from the unknown audiosegment, and then search and calculate similarities in a prepared audio fingerprintdatabase, we can get detail information of that audio. This method avoids theproblems such as lack of tags or have wrong tags exists in traditional keyword-basedsearch. And at the same time, this method could help users find what they want evenwhen they don’t know the keywords.The algorithms of audio fingerprints extracting and matching have achievedsignificant results in some laboratories, and have been applied to some products withrelative small datasets. However, large-scale datasets always introduce performancebottleneck, and problems about concurrency and extensibility.This paper designs, implement and optimize the storage and retrieval of massiveaudio fingerprints based on the deep research in algorithms of audio fingerprintsextracting and matching. This paper first introduces a hash-based structure for audiofingerprints and two distributed hash strategies. And prove the effectiveness of thosemethods by experiments. On those basis, a distributed serialization solution isintroduced and proved effective. The storage structure and distributed solutions has some features such asmultilevel concurrency, high performance, fault-tolerant and can be extended easily.These achievements have practical values for constructing large-scale audiofingerprinting retrieval systems and have significant meanings for the applications ofaudio fingerprinting technologies in modern society.
Keywords/Search Tags:audio fingerprinting, big data, storage and retrieval, distributedstorage
PDF Full Text Request
Related items