Research And Application On Big Data Processing Based On Hadoop Platform

Posted on:2014-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:F Jiang

Full Text:PDF

GTID:2268330401963310

Subject:Computer Science and Technology

Abstract/Summary:

We are now living in an era of data explosion. With the prevalence of cloud computing technology, rapid development of Internet, automation of traditional industries, advance in informationization, and digitization of peopleâ€™s life, we have been surrounded by mass data. The drastic and continuant increase in data scale not only bring in enormous value/profit to people, but also result in severe challenges. Many enterprises, especially those of large scale, has turned their attention to how to store, manage, and handle the huge amounts of data.Weblog processing is one of the promising field in mass data processing. By means of analyzing and processing the web log, enterprises are able to observe and monitor the system operation status. Besides, the enterprises can further count how many users visit the website, get information about who access the website, what do they pay attention to, and make acquaintance of what kind of terminals and browsers the users prefer. All these information acquired will assist enterprise to enhance system availability and promote user experience. However, considering website log records were usually characteristic of large-scale, comprehensive, and hard to understand, it was becoming more and more difficult to adopt traditional log-processing method by only applying a single machine. Facing the rapidly expansive web log, people require mining valuable conclusion from huge amounts of logs with high efficiency. This is also what this paper focused on.This paper focused on how to handle mass web log records efficiently.This paper investigated the recognized prevailing technique-Hadoop as the solution. Hadoop consists of two kinds of main components:HDFS (Hadoop Distributed File System) and MapReduce. HDFS is a distributed file system provided by Hadoop cluster, and MapReduce is a distributed framework. By the two components integration, processing to mass log data can be efficient.This paper firstly studied Hadoop cluster technology, and then on the basis of Hadoop principles, designed a log handling model based on Hadoop. A Hadoop cluster consisting of four nodes was established, on the purpose of data processing for different scale of log data. By comparison with single-machine system, this paper made argumentation that with good design, Hadoop system achieved better performance during the log processing procedure.

Keywords/Search Tags:

Hadoop, HDFS, MapReduce, big-data, web-log-processing

Related items

1	Design And Implementation Of Data Processing Platform Based On Hadoop
2	Research And Application On Big Data Processing Based On Hadoop Platform
3	Researcn And Application Of Data Processing Based On Hadoop
4	The Research And Analysis Of Hadoop Small File Processing Method
5	Research On Distributed Processing Of Massive Video Data Based On Hadoop
6	Join Processing And Optimizing On Large Data Sets Based On Hadoop Framework
7	Mass Sales Data Processing Platform Design And Implementation
8	The Performance Optimization And Improvement Of MapReduce In Hadoop
9	Research Of Massive Data Processing And Mining In Database Marketing Based On Hadoop
10	Research And Implementation Of Sales Forecast In Hadoop-based Enterprise Marketing System