Font Size: a A A

Massive Data Processing Based On Hadoop2.0

Posted on:2016-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuFull Text:PDF
GTID:2308330470479783Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, people began to be drowning in information. For the internet companies provided network services, have to deal with a large number of information every day, in order to analyze the needs of users as well as a variety of products effect and so on. Some of the data require processing of high real-time,while the traditional software processing method can not meet these data storage space and processing time requirements, In this background, the technology of big data has appeared, which has solved the space and time limitation of data processing. Hadoop is the dominant force in the field of big data technology in recently years, it includes the programming model of MapReduce, the storage model of HDFS,the Data Warehouse tool of Hive.And every modelhas its own unique advantages in solving practical problems.The main purpose of this paper is to study how to solve the practical problem by using Hadoop.This paper selects a distributed data capture case to study the specific application of various components in the Hadoop ecosystem, the case is divided into five basic steps, namely task generation, URL generation,data extraction,data aggregation,and the data output,respectively by the completion of Task Generator,URL Generator, Data Extractor, Data Aggreator,Common Publisher, each module execute in sequence and complete data transfer.Finally,through the test results of performance analysis for the case,Hadoop is indeed superior to the traditional data processing system, The application scope of each component is analyzed with the experimental results and the principle of the components. In the process of implementation of the case,each module dependencies exist between,and every module has the possibility of failure, Oozie is a framework with high performance and fault tolerance for big data field,so Oozie was chosen as the scheduling and monitoring system.
Keywords/Search Tags:Hadoop, Massive data processing, Oozie, Data Capture
PDF Full Text Request
Related items