Font Size: a A A

Design And Implementation Of Unstructured Data Unified Storage Platform

Posted on:2014-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y P HeFull Text:PDF
GTID:2268330395489218Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, data on the Internet is growing and changing drastically and continuously. The change is not only in data volume, but also in data form. From traditional text to web document, images, audio and video, the Internet is currently dominated by a huge number of semi-structured and unstructured data. The fast-growing and various unstructured data bring great challenges to the storage management.At first, this paper reviews some storage solutions, which are proposed to address the storage problem for massive, heterogamous and unstructured data, and then points the problem of these existing solutions and the key issues of the unstructured data storage management.Then, to address the storage problem of the massive, heterogeneous, linked unstructured data, this paper proposes an unstructured data unified storage system, called D-Ocean Repository. It addresses the key issues of the storage system, including metadata management, unified storage interlaces, heterogeneous storage engines, high data availability and consistency, and then integrates various of storage engines such as HDFS, HBase, MySQL, XMLDB. In addition, the selection mechanism for these hybrid storage engines is also proposed to make the storage system more efficient.Based on the D-Ocean Repository, the paper designs and implements a batch task processing system for the unstructured data. The system can take advantage of the unified storage system to address the problem of unified processing various types of unstructured data. It implements the efficient parallel data processing system based on MapReduce framework, which makes the computing resource combine the storage system efficiently.Finally, the paper implements the CrossMedia News Retrieval System based on D-Ocean, which proves the practicability of our unified storage system.
Keywords/Search Tags:Unstructured Data, Unified Storage Platform, Batch Processing
PDF Full Text Request
Related items