Research On Technologies Of Log Big Data Analysis Platform

Posted on:2016-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:H J Zhou

Full Text:PDF

GTID:2308330461984202

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The growing trend of Internet development becomes faster, the log data generated by the Internet also will be growth rapidly. Web logs are volume, variety, heterogeneous and dynamic. For Internet companies, how to timely and effectively process the big log data has become a great subject. The traditional single node centralized processing mode cannot deal with the big data. To Internet companies, the demand for log analysis is growing and changing. Log analysis engineers are often beset by growing and changing needs, internet companies must invest a lot of time to follow up the needs.This paper studied and analyzed the distributed storage, computing and scheduling, proposed a big log data analysis process, used the hadoop and hive to build a big log data analysis platform, the log analysis of large data routing and platform. This paper put forward a kind of process for large log data processing, including log collection, preprocess, storage, and analysis. The collecting process is responsible for bring the log data together from distributed web servers, then we can management and use the log data conveniently. Preprocessing procedure is responsible for cleaning and transforming the unformatted log. In storage procedure, we use hadoop and hive to store log data. Based on the previous three steps, in the analysis procedure, we use hive sql to analysis log data.The big log data analysis platform make a model for log collection, preprocess and analysis processes by abstracting this three processes to tasks, including collecting task, preprocessing task and analysis task. The big log data analysis platform offer users interfaces to complete the task configuration. The big log data analysis platform can be used easily by configuring the three tasks. The platform is responsible for task scheduling and task running. Users just need to wait for a result from the platform, and they do not need to know how the platform to schedule and run the tasks. In order to scheduling and running the tasks efficiently, this paper presented a framework named task scheduling and running engine. The engine used traditional master-slave architecture to build. The engine implemented the failure mechanism of static priority scheduling algorithm, and used distributed parallel method to execute. In the last chapter, this paper verified the practicability and efficiency of the platform through an experiment.

Keywords/Search Tags:

web log, big data analysis, distributed computation, hadoop, hive

PDF Full Text Request

Related items

1	Design And Implementation Of Service Data Analysis Subsystem Based On Hadoop
2	Design And Implementation Of Massive Web Log Analysis System Based On Hadoop/Hive
3	Implementation And Application Of E-commerce Data Analysis Platform Based On Hive
4	The Design And Implementation Of Game Log Big Data Analysis System
5	The Design And Implementation Of Network Authentication System Based On Hadoop/hive
6	Design And Implementation Of Contextual Marketing Based On Distributed Computing Hive And Data Mining
7	The Design And Implementation Of Online Retailers Data Analysis System Based On Hadoop
8	The Research And Practice Of Performance Optimization Based On Hive
9	Design And Implementation Of Offline Dataanalysis Platform Based On Hadoop
10	Analysis And Application Of Vehicle Monitoring System Data Based On Hadoop Distributed Computing Platform