Font Size: a A A

Design And Implementation Of The Data Analysis System Besed On Hadoop

Posted on:2013-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:2248330371467573Subject:Information security
Abstract/Summary:PDF Full Text Request
In the massive data processing, how to efficiently and quickly dig out from the mass of data into the potential value of and basis for decision making ability,will become the enterprise’s core competitiveness.No doubt importance of Data analysis,but with the data generated faster and faster,increasing the amount of data, data processing technology faces more and more challenges.How to dig out from a mass of useful data in the value of analysis of the deeper meaning, and then transformed into actionable information, has become the Internet companies have to deal with.This article analyse the issues of present experiencing massive data processing, data collection, data storage,data analysis and query massive data analysis.With comparison between the traditional relational database-based data analysis model and the Hadoop-based the mass data system, Hadoop can be seen in the massive data processing,has characteristic of scalable, low cost, high throughput and so on.This paper analyzes the traditional relational database query in the massive data problems encountered.NoSQL databases were introduced, and the traditional relational database were compared with,the advantages and disadvantages of NoSQL database useful occasion has been summed up.The performance of MapReduce performance analysis, obtained the results of quantitative analysis of MapRecue framework CPU,IO,and network overhead,and made the optimization of comments on MapReduce performance.As a basis for designing a data analysis system based on Hadoop, and have been tested in and applied to practice.In this paper, the following aspects of the work carried out:1.Analysis and comparison between the NoSQL database and the traditional relational database of the advantages and disadvantages.2. Quantitative analysis of the MapReduce programming framework in the 10,CPU and network overhead issues,and gives optimization advice.3.By using the distributed data collection system, to collect vast amounts of data to solve the problem and collecting real-time log.4.By using Hadoop framework, HDFS to solve the problem of mass data storage,and the MapReduce programming framework to solve the problem of mass data processing.5.By using Avatar Node Hadoop framework of the single NameNode nodes to improve,enhance the stability of Hadoop cluster.6.Design of massive data processing system based on Hadoop, and tested in and applied on the practice.
Keywords/Search Tags:cloud computing, mass data processing, hadoop, mapreduce, nosql database, hbase
PDF Full Text Request
Related items