Font Size: a A A

Design And Implementation Of Web Log Analysis System

Posted on:2014-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2268330425483109Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of Internet data today, how to identify the potential knowledge andlaw hiding in these massive data is already an unavoidable problem and fortunately Web mining canbe a good solution to this problem. Web log mining is an important research direction of Web mining,aiming at digging out the user’s browsing behavior and interests from a large number of Web logdata in order to tailor the site structure and make page and service recommendations to the user morepertinently.In this paper, both the theory and complete process of Web log mining are analyzedcomprehensively and systematically. Meanwhile, the improvement about pattern mining algorithmswas presented.Firstly, raw data of Web log have been precessed, including steps of data cleaning, useridentification, session identification, path supplement and transaction identification. Thepreprocessing not only filters the data, but also converts the Web log into transactional database, laidthe foundation for the pattern mining.Secondly, the ideology of association rule mining and Apriori algorithm are discussed deeply.The drawback of Apriori algorithm that it need scan the database repeatly to generate acandidate set is considered.According to this, an algorithm called soft maximum association rulemining is proposed by combining soft set theory and association rule mining innovatively in thispaper. As a new tool to deal with uncertainty, soft set is simple and unique in model describtion andit has been successfully applied in decision-making problems.The transaction database discribedwith a soft set, can present richer knowledge and information.So mining association rules on the softset can achieve a better result. To avoid brute-force search to the support subsets of the attribute, wepropose a soft maximum association rule algorithm. It can not only guarantee the precision ofmining results, but also demonstrate outstanding advantages in time complexity.Finally, the paper designs a Web log analysis system and programs to implement it. Ruleextraction to university web logs is executed based on soft largest association rules algorithm in the system and the results are presented to the user through the system interface. In addition, theperformance analysis module the system can make statistical analysis of the number of page hits,page dwell time, user sources and popular pages so that the site administrator can know the visitdetails of users. As a reference, the website is improved.
Keywords/Search Tags:Web log, data mining, association rule, soft set, soft maximal association rule mining
PDF Full Text Request
Related items