Font Size: a A A

Research On The Algorithm For Mining Continuous Frequent Access Patterns From Web Logs

Posted on:2007-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:S G TangFull Text:PDF
GTID:2178360242961925Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Web mining is a newly arisen research direction in computer science, and the frequent access paths discovery is an important research aspect of which , having very strong realistic meaning. Frequent access paths can be divided into continuous and incontinuous two kinds, because of the limitation of web station reference structure which have a continuous characteristic, in some cases, mining continuous frequent access paths is more valuable.Analyse result indicates that we can use universal sequential pattern mining algorithms to mine the continuous frequent access paths, but the efficiency is lower, furthermore, what it gained are only frequent access paths; special mining algorithms own a higher efficiency, but the applied domain of which is too narrow, only can be used in the Maximal Forward References.For solving this problem, we study the quality of access paths and based of which put forward an algorithm that can mine continuous frequent access paths from Web logs .The algorithm make improvement for WAP-Tree(the Web Access Pattern Tree) , and devises a new data structure IWAP-Tree(the Improved Web Access Pattern Tree) to compress the store space and save all needed information for mining; At the some time, abandoning the traditional mining methods which generate frequent patterns by jointing and pruning from bottom to top, by adopting method of the zoning searching we construct a suffix tree for each frequent page node, then mining continuous frequent access paths by visiting the tree. In the mining procedure no candidate generation is required, and only one time, we can obtain all the continuous access paths that take the root of IWAP-Tree as suffix.Designing and realizing an experiment system, and making use of which to compare the time and memory expense of the PAP-Mine (Postfix Access Patterns Mine) and WAP (Web Access Patterns) algorithm. The experiment shows, the PAP-Mine algorithm have more highly efficiency and more well stability.
Keywords/Search Tags:Web mining, Web logs, Frequent access patterns, Continuous frequent access patterns
PDF Full Text Request
Related items