Font Size: a A A

The Design And Implementation Of Massive Data Storage And Calculation Platform Based On Hadoop

Posted on:2015-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2298330452950139Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the fast development of the communication and Internet technology,thedatum which made from the mobile network and madeby users rise as a speed of TBlevel.The datum processing and storage system of the enterprise is facing a seriouschallenge.What has been a very serious problem which need to be consideredcarefully is the ability of managing and storing the datum,but to solve thisproblem,the application of Cloud Computing and distribute system can handle itwell.Hadoop’s distribute system and its random interview can offer the storingcondition for the big datum.Meanwhile,the reason why many enterprises have fond inHadoop is for its high performance,responsibility,low cost and easy-extensioncharacters.HBase is the subproject of Hadoop system,on one hand it is different fromthe relational database on data update and storage,on the other hand it also inherit theadvantages of the Hadoop.The port which HBase offered makes it more convenientfor workers to use.Thus,a great number of enterprises use Hadoop distribute systemto do the processing and storage about the data ocean.Some problems may come across in data processing and storage forcommunication enterpriseshave been analyzed in this article,what’s more,it also hasdesigned the general outline of the system andmake the main parts’ function cometrue.The author’s main work as followings:(1)Have an analysis according to the whole province storage and inquire need ofthe Internet log files which suggested by a subsidiary of China Telecom.And give aframe structure base on the need of the system’s security and stabilization.What’smore,a lab environment also has been built for Hadoop group.This environment canoffer great support for the realization for the system.(2)According to the function of the modules the general structure can dividedinto four parts:datum collection,datum processing,datum storage and datuminquiry.And the functions of different parts are carefully designed.Meanwhile itmainly achieves the function of the datum processing,datum storage and datuminquiry. (3) Data processing part mainly use MapReduce programme to do some datamanagement,almost about the data storage thing.The datum processing part use threedifferent kinds of ways to do the storage which includes relational database Oracle、Lucene and HBasedistribute system.WEB port use caching technology through thepresentation of the pages.Service port is mainly the data inquiry,the result whichcome out from the inquiry pass to the former page in the way of Json.(4)Comparing and analyzing the storage and inquiry quality about Oracle、Lucene and HBase through four groups of different level of datum.Combine thestorage ability of Hadoop to analyze the result of the test,at last, the result shows thatHadoop’s distribute system can satisfy the requirement that users present,it also canput forward some advanced methods to rise the performance of the syste.
Keywords/Search Tags:Hadoop, Massive Data, HBase, MapReduce, HDFS
PDF Full Text Request
Related items