Font Size: a A A

Massive Audio Data Management System

Posted on:2015-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:J ZengFull Text:PDF
GTID:2308330464458033Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text information retrieval technology was developed before years. Search engines such as Google and Baidu are just built on this mature technology. However, multimedia information retrieval technology is not yet efficient as text information retrieval. Therefore multimedia searching products are not as many as text searching engines. Actually, multimedia related products on the market are just variants of text searching engines, which are instead based on the metadata of multimedia.Nowadays content-based indexing and retrieval of music and audio is still an extremely hot research topic. On the other hand, how to efficiently store massive audio data which is going to be more and more is also an interesting question. So this thesis is focus on massive audio data management, storage and retrieval. At the same time, well-performed massive audio data management system is successfully built after researching. First, the storage strategy on HDFS is modified. It contributes to load balancing in the cluster. And it helps our system to be more scalable and intelligent on data storage. Second, a high-dimensional data clustering algorithm, K-means+, is introduced. It is scalable, fast and semantic effective for indexing. Third, a two-step matching method is produced for querying not only the correct answer, but also those most related answers that users may be interesting in. It is proved to perform well in recall, precision and response time. Based on the experiment results, our system combining these technologies is competent to manage massive audio data.Audio feature extraction is the first step in our system. Being lack of knowledge about this field, however, this thesis cannot explain more detail on how it works. It is fine when only data management is focused. Meanwhile, metadata is not used in our system, because of the topic saying content-based retrieval of music and audio data. It may be part of future work.
Keywords/Search Tags:audio, content-based, MapReduce, clustering, data storage, matching
PDF Full Text Request
Related items