Font Size: a A A

Research On The Algorithm For Mining Access Patterns From Web Logs

Posted on:2005-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:2168360152469260Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data mining and application to the World Wide Web are active research fields. Application of data mining techniques to WWW, referred to as Web data mining. Web data mining contains three issues: the first, called Web content mining, is the process of information discovery from source across the World Wide Web; the second, called web structure mining; the third, called Web access pattern mining, is the process of Web access patterns discovery from website logs, including association rules and sequential patterns etc. In this paper, we investigate the issue related to efficiently mining Web access patterns from large set of pieces of Web log. Access patterns can be mined using sequential pattern mining techniques. However, it may generate a huge of candidate patterns, and reduce the mining speed.To reduce the number of candidate patterns during Web access pattern mining, we present a wholly new algorithm WAPM for Web access pattern mining. The key consideration is how to facilitate the tedious support counting and candidate generating operations in the mining procedure. The algorithm mainly contain two issues: A nice data structure WAS-tree is devised to register access sequence and corresponding counts compactly, so that the tedious support counting can be avoided. Once such a data structure is built, all the remaining mining processing is based on the WAS-tree. The original access database is not needed any more. The construction of WAS-tree is quite efficiently by simply scanning the access sequence database twice; An efficient recursive algorithm is proposed to enumerate access patterns from WAS-tree. No candidate generation is required in the mining procedure, and only the patterns with enough support will be under consideration.The implementation of the system based on WAPM algorithm in Windows2000 Server platform, WAS-Miner, is presented.
Keywords/Search Tags:Web log, Web access pattern, Sequential pattern, Web mining
PDF Full Text Request
Related items