Font Size: a A A

Research On The Solution Of Hybrid Storage Based On Hadoop

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:D Z YuFull Text:PDF
GTID:2308330482494707Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the gradual increasing popularity of Internet applications and internet contents, more and more industries select combine with the Internet, which result in a large amount of data. Review the existing technologies, it is easy to conclude, all the techniques are based on growing desire of rich media, as users, we often watch movies and create and upload photos and videos to the network and so on. Not only the amount of data is increasing, but also the rate of data generation. The ways to acquire and manage the data well, and how to select effective method to analyze data and extract value is now the most important issue.With the explosive growth of data, the hybrid storage platform with structured data, semi-structured and unstructured data is the fundament of data analysis. The classical method of data processing system extensions such as the upward extension with the expensive mainframe configuration, or the outside expansion by upgrading hardware for more powerful processing capability, but they all requires high cost. Further, since the deep analysis technology of massive data is still at a preliminary stage, and in general, the mainstream such as Hadoop MapReduce does not perform well in real-time analysis. Therefore, designing a resolution of hybrid storage which could links the different data sources and Hadoop is valuable.This dissertation begin with the popular big data processing framework-Hadoop, analyzes the existing principles of distributed storage and computing technology, combining the characteristics of different types of databases and cluster technology of Hadoop, and proposes a kind of hybrid storage solutions based on Hadoop. The main work includes:To achieve HDFS data exchange platform, by using the distributed file storage platform, different types of data can be shared interactively in Hadoop and traditional database systems, which is conducive to deeper data analysis.For the administrative issues such as the single control node in Hadoop and rack-accorssing, cluster-acrossing, I designs a multi-node management control structure, thereby enhance the system reliability and reduce the load pressure of single control node.For unstructured data storage, the use of Hadoop-based data storage products HBase as the storage database, but in order to deal with massive data import efficiently, I design a method of file transfer, which could speedup the rate of storage by conversion of file formats to suit HBase file storage characteristics.For structured data storage analysis, use the Hadoop-based data storage production Hive which is suitable for structured data storage as the offline data warehouse to offer deep data analysis. And real-time query needs can be done by traditional relational database. This dissertation has implemented the data import processes from the relational database via Hadoop and finally to Hive.Have built a specific FTP server, which can interact with Hadoop distributed file system and help the multi-user of clients access and manage data.
Keywords/Search Tags:Big Data, Hadoop, Structured Data, Unstructured Data, Storage
PDF Full Text Request
Related items