Font Size: a A A

Research And Implementation Of Web Log Mining Based On Distributed Computing System

Posted on:2016-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:H B WangFull Text:PDF
GTID:2308330461984144Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and cloud computing technology, more and more people access services from the clouds. Cloud education systems are based on cloud computing technology, supported by web services, aiming to provide education related services for users. Along with the application of cloud education continuously enriched, web log data becomes larger and larger. Massive Web log data contains a lot of useful information. On one hand, we can get access information from the logs. On the other hand, we can get user access preference and potential useful information through data mining algorithm. However, it is a difficult problem to find useful information from massive web log data quickly. Traditional stand-alone web log mining system and data mining algorithms are far from needs, data mining algorithm and system based on distributed parallel environment have become an inevitable developing direction.Hadoop is an open source distributed platform, which is used for large-scale data processing. It includes MapReduce, which is a distributed computing framework and HDFS, which is a distributed file system. This paper is mainly to solve the bottleneck of traditional data mining in dealing with massive log data. It realizes parallel computing of traditional data mining algorithm and parallel computing platform based on hadoop, which is suitable for massive data processing. The log analysis system is based on distributed technology in cloud education scenario, in which the improved algorithm is used. The system is used to analyze user information and predict user behavior. The results are presented in visual interface to the system manager.Through in-depth study of related literatures and technology, this paper carried out parallel optimization of association rule algorithm and applied it to web log analysis system. Web log analysis platform based on distributed computing platform consist of collection module, distributed storage module, preprocessing module, distributed processing module and visual display module. The function of each module is verified by building distributed cluster and web log mining system. The accuracy and efficiency of the algorithm are fully tested. The test result shows that the performance of web log mining system based on hadoop has been greatly improved compared to the system based on single node. We get high accuracy and stable performance by applying improved association rule algorithm to web log mining system.
Keywords/Search Tags:distributed system, log mining, cloud computing, MapReduce
PDF Full Text Request
Related items