Research On Frequent Traversal Sequence Mining Based On Web Click Streams

Posted on:2007-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:X J Zhang

Full Text:PDF

GTID:2178360212495463

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

By analyzing web usage mining situation of foreign and domain, we found that the previous algorithms exist many problems for mining frequent traversal sequence(FTS) in static and dynamic web clickstreams environment. These algorithms may be inefficient in dealing with the following four problems. Firstly, Web association rules mining often ignores the temporal character of sessions. Secondly, simple algorithms of frequent sequences mining may generate very huge candidate sequences, since those algorithms do not adopt the constraint methods. Thirdly, most incremental mining algorithms just consider inserting the transactions into the original traversal sequences. However, few algorithms refer to the situation that when the new traversal sequences are inserted into and the old traversal sequences are deleted from session database. Fourthly, there are some difficulties to extend the mining technique to dynamic web clickstreams. The main difficulty is that the existing False-positive based algorithms can not overcome the dilemma between the recall of sequence and the precision of mining result.Firstly, two efficient algorithms are proposed to discover maximum frequent traversal sequences. The two algorithms all use the bidirectional constraint strategy to constrain each page and sequence in session database. The first algorithm successfully limits the appearing frequency of the meaningless page, and the second algorithm solves the problems caused by the longer sequence. Two kinds of algorithms outperform GENmax, FPmax, SPADE, MSPS and GSP for mining maximum frequent traversal sequences.Next, according to sequence model, this paper proposes a novel algorithm, to account for the incremental mining problems. The algorithm utilizes the constraint strategy and web site structure to find the new FTS from the inserted and deleted part of the database such that the size of candidate sequence sets andsearch space of FTS can be reduced. The performance of the algorithm is better than IncSpan and MFTP when the previous mining results and constraint are used.Finally, based on False-Negative approach and time-sensitive sliding window, this paper proposes an efficient algorithm to balance the conflict between recall and precision in dynamic web click streams. The algorithm adopts the weighted harmonic count function with constrain to calculate the support count of each traversal sequence, and adjusts the value of relaxation ratioρby changing the regulatory factorξ. The algorithm can simultaneously meet recall and precision, and outperforms Lossy Counting and MineSW.We implement the above four algorithms with Java and C++. All of our experiments are performed on the four datasets, M0606A, M0607B, BMS-WebView-1 and BMS-WebView-2 to mine maximum frequent traversal sequences and FTS. The experimental results show the feasibility and effectiveness of our algorithms.

Keywords/Search Tags:

Web click streams, Frequent traversal sequence, Dwell time, Session, Bidirectional constraint, Time-sensitive sliding windows

PDF Full Text Request

Related items

1	Research On Fault Detection Method Of Linear Switched System Under Relaxed Persistent Dwell Time Constraint
2	Switched System Analysis Based On Dwell Time
3	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
4	Mining Association Rules In Data Streams
5	The Research On The Related Problems Of Association Rule Mining Over Data Streams
6	Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams
7	Study On Frequent Data Mining From Uncertain Data Streams
8	Research On TSN Gating Scheduling Based On Constraint Policy Crossing
9	Multiobjective Evolutionary Algorithm Research On Vehicle Routing Problem With Time Windows And Two Dimensional Loading Constraint
10	The Research On Algorithm Of Frequent Itemsets Mining In Data Streams