Design And Implementation Of The Data Analysis System Besed On Hadoop

Posted on:2013-09-01

Degree:Master

Type:Thesis

Country:China

Candidate:T Liu

Full Text:PDF

GTID:2248330371467573

Subject:Information security

Abstract/Summary:

PDF Full Text Request

In the massive data processing, how to efficiently and quickly dig out from the mass of data into the potential value of and basis for decision making ability,will become the enterprise’s core competitiveness.No doubt importance of Data analysis,but with the data generated faster and faster,increasing the amount of data, data processing technology faces more and more challenges.How to dig out from a mass of useful data in the value of analysis of the deeper meaning, and then transformed into actionable information, has become the Internet companies have to deal with.This article analyse the issues of present experiencing massive data processing, data collection, data storage,data analysis and query massive data analysis.With comparison between the traditional relational database-based data analysis model and the Hadoop-based the mass data system, Hadoop can be seen in the massive data processing,has characteristic of scalable, low cost, high throughput and so on.This paper analyzes the traditional relational database query in the massive data problems encountered.NoSQL databases were introduced, and the traditional relational database were compared with,the advantages and disadvantages of NoSQL database useful occasion has been summed up.The performance of MapReduce performance analysis, obtained the results of quantitative analysis of MapRecue framework CPU,IO,and network overhead,and made the optimization of comments on MapReduce performance.As a basis for designing a data analysis system based on Hadoop, and have been tested in and applied to practice.In this paper, the following aspects of the work carried out:1.Analysis and comparison between the NoSQL database and the traditional relational database of the advantages and disadvantages.2. Quantitative analysis of the MapReduce programming framework in the 10,CPU and network overhead issues,and gives optimization advice.3.By using the distributed data collection system, to collect vast amounts of data to solve the problem and collecting real-time log.4.By using Hadoop framework, HDFS to solve the problem of mass data storage,and the MapReduce programming framework to solve the problem of mass data processing.5.By using Avatar Node Hadoop framework of the single NameNode nodes to improve,enhance the stability of Hadoop cluster.6.Design of massive data processing system based on Hadoop, and tested in and applied on the practice.

Keywords/Search Tags:

cloud computing, mass data processing, hadoop, mapreduce, nosql database, hbase

PDF Full Text Request

Related items

1	Research Of A Mass Transaction Record Query System Based On Hadoop
2	Optimization And Application Research Of MapReduce Computing Model Based On Hadoop
3	Mass Sales Data Processing Platform Design And Implementation
4	The Research Of Mapreduce Implementing Of Text Classification Algorithm Based On Mass Data
5	Research On The Mass Data Processing Method Of Vehicle Mounted IOT Based On Hadoop Platform
6	Data Mining Based On Hadoop Platform
7	Research Of Massive Data Processing And Mining In Database Marketing Based On Hadoop
8	The Study On The Key Technologies Of Data Retrieval In Cloud Database
9	Research On Security Of NoSQL Databases Based On Hadoop
10	The Research And Design Of Distributed Vertical Search Engine