Font Size: a A A

An Efficient Web Traversal Pattern Mining Algorithm Based On Suffix Array

Posted on:2006-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:T JingFull Text:PDF
GTID:2168360155952979Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The discovery of user traversal pattern is one of the most important tasks of Web application mining, which falls into sequential pattern mining domain. Some researchers accomplish Web log mining task using sequential pattern mining algorithms such as AprioriAll, AprioriSome, [1] and GSP (General Sequential Patterns mining algorithm [2]). The approach taken by Chen et al is also based on traditional data mining technique [3]. They introduce MFR (Maximal Forward Reference), divide session into transaction sequence, and then use FS (Full Scan) and SS (Selective Scan), which is similar to association rules, to mine the frequent traversal path. Similar work can be found in [4]. All of the above-mentioned algorithms based on Apriori adopt generate-and-test paradigm, which generate candidate sets first, then test whether items in the candidate set satisfy a given minimal support. The process is iterated till there is no new candidate set generated. Some improved version use heuristic rule to reduce the number of candidate set, but multiple-scan of the whole data set is still needed and incremental data can not be processed effectively. But web traversal pattern mining is different from traditional sequential pattern mining problem in the following respects: (1) A traversal pattern is a consecutive sequence of elements, whereas transaction sequences are not necessarily consecutive. (2) Elements in a traversal pattern are singletons whereas elements in traditional sequences are set of elements. In [5], sequential constraint is added to classical Concept Lattice data structure, and ordered concept lattice is defined, which is used in web user traversal patterns discovery. Although the algorithm thus obtained is incremental in nature involving only one scan of the whole data set, the data structure is complicated for tackling the problem, with above O(n2logn) time complexity for the construction of Ordered Concept Lattice.
Keywords/Search Tags:Efficient
PDF Full Text Request
Related items