The discovery of user traversal pattern is one of the most important tasks of Web application mining, which falls into sequential pattern mining domain. Some researchers accomplish Web log mining task using sequential pattern mining algorithms such as AprioriAll, AprioriSome, [1] and GSP (General Sequential Patterns mining algorithm [2]). The approach taken by Chen et al is also based on traditional data mining technique [3]. They introduce MFR (Maximal Forward Reference), divide session into transaction sequence, and then use FS (Full Scan) and SS (Selective Scan), which is similar to association rules, to mine the frequent traversal path. Similar work can be found in [4]. All of the above-mentioned algorithms based on Apriori adopt generate-and-test paradigm, which generate candidate sets first, then test whether items in the candidate set satisfy a given minimal support. The process is iterated till there is no new candidate set generated. Some improved version use heuristic rule to reduce the number of candidate set, but multiple-scan of the whole data set is still needed and incremental data can not be processed effectively. But web traversal pattern mining is different from traditional sequential pattern mining problem in the following respects: (1) A traversal pattern is a consecutive sequence of elements, whereas transaction sequences are not necessarily consecutive. (2) Elements in a traversal pattern are singletons whereas elements in traditional sequences are set of elements. In [5], sequential constraint is added to classical Concept Lattice data structure, and ordered concept lattice is defined, which is used in web user traversal patterns discovery. Although the algorithm thus obtained is incremental in nature involving only one scan of the whole data set, the data structure is complicated for tackling the problem, with above O(n2logn) time complexity for the construction of Ordered Concept Lattice.
|