Font Size: a A A

Research On Web Log Mining And Application Based On Association Rule

Posted on:2015-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2298330467489474Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
World wide web is a huge, wide-spread, and global information storage, which involves into news, advertisements, consume information, finance management, education, government, electric-business and many other information services. Web includes rich and dynamic hyperlinks, and huge information of access, which could provide a lot of resources for data mining. How to get users1potential interests and other useful information to provide individual and intelligent information services through the web resources has become pretty necessary to every website builder. Web data mining is a kind of data mining technique to dig the the data generated by the interaction between users and web server. Web data mining is able to gain users’ frequent levels of web access actions and behavior patterns, and hiding disciplinary knowledge by data mining techniques. These informations could improve the structure of hyperlinks between web pages, raise the quality of websites’ service, better the performance of websites, in addition to feed back some suspicious info to network administrators to strengthen the safety of websites. Therefore how to improve the efficiency and accuracy of Web Log mining at this Internet times, that the web users’ urgent requirements should be understood and met, has been a pretty much worthy of researching subject.The stages of Web Log mining have been introduced and analysed in this paper. In addition, a relatively general web log mining system has been designed and realized. The system can gain the assocaition rules between the frequent access paths by digging the Web log data, and provides some advices for optimizing the structure of the website. The main contents are described as follows:1. Firstly, analyze deeply the disadvantages of traditional session identity methods in data preprocessing stage and propose a new session identity method based on decision tree induction, which can generate sessions of more reality.2. Then, analyze the reason of low efficiency of classic association rules algorithm called Apriori, and realize some relative improvements to make it up.3. Design and realize a web log mining system, which includes modules of data reading, data preprocessing, pattern discovery and results expression. The data preprocessing module would use the new session identify method mentioned upside. In addition, the pattern discovery module would use the improved Apriori algorithm, by which we could process the pattern analysis to improve the performance of websites.At last, analyze the discovered user frequent access pattern to find out the association rules bettween the frequent access paths, based on which, some advices have been proposed.
Keywords/Search Tags:web log mining, association rules, Apriori algorithm, session identity
PDF Full Text Request
Related items