Massive Audio Data Management System

Posted on:2015-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:J Zeng

Full Text:PDF

GTID:2308330464458033

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Text information retrieval technology was developed before years. Search engines such as Google and Baidu are just built on this mature technology. However, multimedia information retrieval technology is not yet efficient as text information retrieval. Therefore multimedia searching products are not as many as text searching engines. Actually, multimedia related products on the market are just variants of text searching engines, which are instead based on the metadata of multimedia.Nowadays content-based indexing and retrieval of music and audio is still an extremely hot research topic. On the other hand, how to efficiently store massive audio data which is going to be more and more is also an interesting question. So this thesis is focus on massive audio data management, storage and retrieval. At the same time, well-performed massive audio data management system is successfully built after researching. First, the storage strategy on HDFS is modified. It contributes to load balancing in the cluster. And it helps our system to be more scalable and intelligent on data storage. Second, a high-dimensional data clustering algorithm, K-means+, is introduced. It is scalable, fast and semantic effective for indexing. Third, a two-step matching method is produced for querying not only the correct answer, but also those most related answers that users may be interesting in. It is proved to perform well in recall, precision and response time. Based on the experiment results, our system combining these technologies is competent to manage massive audio data.Audio feature extraction is the first step in our system. Being lack of knowledge about this field, however, this thesis cannot explain more detail on how it works. It is fine when only data management is focused. Meanwhile, metadata is not used in our system, because of the topic saying content-based retrieval of music and audio data. It may be part of future work.

Keywords/Search Tags:

audio, content-based, MapReduce, clustering, data storage, matching

PDF Full Text Request

Related items

1	Study On Content-based Audio Retrieval Technology
2	Research On The Clustering Algorithm Of Parallel Partition Based On MapReduce
3	Research, Design And Application Of Clustering Algorithm Using Mapreduce
4	Efficient Retrieval For Massive Audio Based On Content
5	Parallel Clustering Algorithm Based On MapReduce
6	Research On Hierarchical Clustering Algorithm And Parallelization In Massive Data Environment
7	Research And Implementation Of Mapreduce-based Graph Clustering Algorithm
8	Research On Clustering Algorithms Of Location Big Data Based On MapReduce
9	Content-Based Audio Video Searching And Commercial Detection
10	Research And Application Of Clustering Mining Algorithm Oriented Big Data Based On MapReduce