Font Size: a A A

Research And Application On Big Data Processing Based On Hadoop Platform

Posted on:2014-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:F JiangFull Text:PDF
GTID:2268330401963310Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
We are now living in an era of data explosion. With the prevalence of cloud computing technology, rapid development of Internet, automation of traditional industries, advance in informationization, and digitization of people’s life, we have been surrounded by mass data. The drastic and continuant increase in data scale not only bring in enormous value/profit to people, but also result in severe challenges. Many enterprises, especially those of large scale, has turned their attention to how to store, manage, and handle the huge amounts of data.Weblog processing is one of the promising field in mass data processing. By means of analyzing and processing the web log, enterprises are able to observe and monitor the system operation status. Besides, the enterprises can further count how many users visit the website, get information about who access the website, what do they pay attention to, and make acquaintance of what kind of terminals and browsers the users prefer. All these information acquired will assist enterprise to enhance system availability and promote user experience. However, considering website log records were usually characteristic of large-scale, comprehensive, and hard to understand, it was becoming more and more difficult to adopt traditional log-processing method by only applying a single machine. Facing the rapidly expansive web log, people require mining valuable conclusion from huge amounts of logs with high efficiency. This is also what this paper focused on.This paper focused on how to handle mass web log records efficiently.This paper investigated the recognized prevailing technique-Hadoop as the solution. Hadoop consists of two kinds of main components:HDFS (Hadoop Distributed File System) and MapReduce. HDFS is a distributed file system provided by Hadoop cluster, and MapReduce is a distributed framework. By the two components integration, processing to mass log data can be efficient.This paper firstly studied Hadoop cluster technology, and then on the basis of Hadoop principles, designed a log handling model based on Hadoop. A Hadoop cluster consisting of four nodes was established, on the purpose of data processing for different scale of log data. By comparison with single-machine system, this paper made argumentation that with good design, Hadoop system achieved better performance during the log processing procedure.
Keywords/Search Tags:Hadoop, HDFS, MapReduce, big-data, web-log-processing
PDF Full Text Request
Related items