Font Size: a A A

Design And Implementation Of Web Log Data Analysis System Based On Hadoop

Posted on:2018-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:C W LiuFull Text:PDF
GTID:2348330536481610Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet,more and more enterprises built a variety of Web Systems belongs to themselves,the scale of network is growing,so Web has become the largest databases of the world.Government,business or individuals are faced with how to deal with a large number of Web data.In the past,the staff used vim tool,orders or some Shell scripts to count information on the server,but with The faster the data is produced,the increasing the amount of data,and the increasing challenges in data processing technology,it is important to mine potential value from the company Web and convert the value to the a decision basis.But as the data grows rapidly,the amount of data is getting bigger and bigger,the data processing technology face is facing the huge challenge.the traditional log analysis of the company method has been unable to meet the need about storage and computational efficiency.In order to improve improve the storage capacity and computational efficiency of log analysis and make user interaction friendly,this paper proposes a distributed solution based on Hadoop and finish a Web log data mining and analysis system based on Hadoop.The following aspects had been studied:(1)The paper first explores the common methods and directions of data mining at home and abroad.(2)The functional requirements of the analysis system are analyzed in detail,the structure fo My SQL,s table used to store the analysis results is studied.Finally,the three analysis subsystems are analyzed and finished:data collection system collected data into HDFS by Flume tool;data analysis system clean off-line data and store results in HBase firstly,then Map Reduce and Hive script analyze data in HBase and store results in My SQL;data display display the data in the form of chart based on Spring and MyBatis framework.(3)Hadoop components and Spring framework are combined,Map Reduce procedures and Hive scripts analyze different functional modules.Oozie workflow integrate Map Reduce and Hive job together and set scheduled task to execution,Spring,My Batis and High Chart constitute a Spring MVC framework.(4)The shortcomings of the traditional CART algorithm are analyzed,the CART algorithm is improved.The paper finish a parallel calculation of Gini coefficient withen the property,Gini coefficient between attributes and gain value of surface error rate in CCP pruning algorithm.asdasd(5)The paper completes the unit test of the various functional modules,thesystem integration test and the performance test of analysis modules.The results show that the analysis system is reliable,efficient and user-friendly.The final research results are a complete Hadoop-based distributed Web log data mining and analysis system,which provides an API interface for supporting the basic framework of large data analysis and searching the result of analysis.
Keywords/Search Tags:Web log, data minning, off-line data, Hadoop
PDF Full Text Request
Related items