Design And Implementation Of Web Log Data Analysis System Based On Hadoop

Posted on:2018-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:C W Liu

Full Text:PDF

GTID:2348330536481610

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the internet,more and more enterprises built a variety of Web Systems belongs to themselves,the scale of network is growing,so Web has become the largest databases of the world.Government,business or individuals are faced with how to deal with a large number of Web data.In the past,the staff used vim tool,orders or some Shell scripts to count information on the server,but with The faster the data is produced,the increasing the amount of data,and the increasing challenges in data processing technology,it is important to mine potential value from the company Web and convert the value to the a decision basis.But as the data grows rapidly,the amount of data is getting bigger and bigger,the data processing technology face is facing the huge challenge.the traditional log analysis of the company method has been unable to meet the need about storage and computational efficiency.In order to improve improve the storage capacity and computational efficiency of log analysis and make user interaction friendly,this paper proposes a distributed solution based on Hadoop and finish a Web log data mining and analysis system based on Hadoop.The following aspects had been studied:(1)The paper first explores the common methods and directions of data mining at home and abroad.(2)The functional requirements of the analysis system are analyzed in detail,the structure fo My SQL,s table used to store the analysis results is studied.Finally,the three analysis subsystems are analyzed and finished:data collection system collected data into HDFS by Flume tool;data analysis system clean off-line data and store results in HBase firstly,then Map Reduce and Hive script analyze data in HBase and store results in My SQL;data display display the data in the form of chart based on Spring and MyBatis framework.(3)Hadoop components and Spring framework are combined,Map Reduce procedures and Hive scripts analyze different functional modules.Oozie workflow integrate Map Reduce and Hive job together and set scheduled task to execution,Spring,My Batis and High Chart constitute a Spring MVC framework.(4)The shortcomings of the traditional CART algorithm are analyzed,the CART algorithm is improved.The paper finish a parallel calculation of Gini coefficient withen the property,Gini coefficient between attributes and gain value of surface error rate in CCP pruning algorithm.asdasd(5)The paper completes the unit test of the various functional modules,thesystem integration test and the performance test of analysis modules.The results show that the analysis system is reliable,efficient and user-friendly.The final research results are a complete Hadoop-based distributed Web log data mining and analysis system,which provides an API interface for supporting the basic framework of large data analysis and searching the result of analysis.

Keywords/Search Tags:

Web log, data minning, off-line data, Hadoop

PDF Full Text Request

Related items

1	Design Of Data Warehouse & Data Minning Base On Hygienic Manager Information System
2	Based On Hadoop Electric Offline Patterns Of Data Mining System Design And Implementation
3	Reserach On Eigen-keneral-data Retrieving Algorithms Based On Hadoop
4	Platform Development On Massive Data Collection And Processing Based On Hadoop
5	Design And Implementation Of Industrial Equipment Maintaince Platform Based On Hadoop
6	Research And Implementation Of Duplicate Data Clean-up Model Based On Hadoop
7	The Study Of Meteorological Data Acquisition And Data Dining Platform Based On Hadoop
8	Design And Implementation Of The Weibo Statistial System Based On Hadoop
9	Research On The Application Of Data Tracing Me Thod In Hadoop Environment
10	Research And Implementation Of Mining Association Rules For EMU Failure Data Based On Hadoop