Font Size: a A A

Distributed Analysis And Retrieval Of Massive Unstructured Data

Posted on:2013-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:B YuFull Text:PDF
GTID:2268330395989227Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The advent of Web2.0brings us into an era of Big Data. According to statistics,80%of data is unstructured data (UD). UD includes image, audio, and video etc. Facing the problem of big data, figuring out an effective way to get the valuable information from massive data is a hot topic in recent year. In summary, the contributions of this paper are as followed:Firstly, we propose a distributed Analysis System for UD (UDAS). In order to extract valuable information from vast amounts of UD, the primary task is to extract feature of UD by analyzing. On one hand, the diversity of UD will lead to complexity of analyzing UD; On the other hand, due to the scale of UD, we need an effective analysis system. Therefore, we implement an effective and extensible system called UDAS. This system includes a high-extensible class inherent system and a plug-in system. User can implement a new analysis function for a specific UD by developing a plug-in. UDAS can run the analysis task distributive and increase analysis speed.Secondly, we design a universal distributed framework of distributed index system for UD (UDFDIS). Based on the UDAS, we implement a universal and extensible system of UDFDIS with high performance, reliability and availability, according to the common problems of different kinds of UD. Based on UDFDIS, we implement different kinds of distributed index for different UD. Then we thoroughly describe the design of index cluster, search cluster, message exchange mechanism, metadata management and system execution process.Finally, we present a data dividend policy by using LSH. And according to this policy, we also design a local indexing policy by combining LSH and SH. Then we apply these policies to UDFDIS, design the structure of local index file and use a serial of experiments to test the feasibility and effectiveness of these policies. Therefore, we implement the function of searching massive unstructured data based on UDFDIS.
Keywords/Search Tags:Big Data, Unstructured, Distributed, High-Dimensional Data, High-Dimensional Index
PDF Full Text Request
Related items