Research On Sequential Pattern Mining And Web Usage Mining

Posted on:2011-05-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q W Yang

Full Text:PDF

GTID:1228330362453702

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Sequential patterns mining is the data mining technology applied on sequence databases. It aims at finding relationships between sequential events and specific ordering among them. Sequential pattern mining is the extension of association rule mining and is widely applied in customer behavior analysis, web browsing pattern analysis, scientific experiments analysis, early diagnosis of diseases, forecast of natural disasters, DNA sequence analysis and so on. There have been great advances in the research and application of sequential pattern mining techniques, while there still exists key issues, e.g., high complexity of algorithms, low efficiency for large scale datasets and bad adaptability. This dissertation focuses on sequential pattern mining methods and its application in web usage mining using data mining methods and genetic algorithm theory. The main contributions of the dissertation are summarized as follows.First, the data mining concepts and various data mining techniques for different types of data are presented, and the development of data mining are reviewed. The clustering technique is introduced specifically, including the basic theories, algorithms, and the detailed process.Second, since the k-means algorithm is sensitive to noise and outliers, and is easy to be trapped in local optima, and especially the number of clusters has to be specified a priori, the Genetic k-medoids algorithm (GKMD) is presented to improve the disadvantages. The GKMD adopts the number of clusters as a variable in the fitness function. The chromosome encodes the number of cluster coding with the position of medoids, and corresponding crossover and mutation operators are designed. Therefore, the GKMD algorithm can determine the optimal number of clusters in the evolution process. Except for the global search capability of the GA, the GKMD algorithm uses effective heuristic search methods to enhance the local search ability. Experiments illustrates that the GKMD algorithm performs robustly on datasets with noise and outliers, and can both determine the optimal number of clusters and obtain higher clustering accuracy.Third, a novel two stage scheme for mining sequential patterns is proposed. It clusters the sequences into several groups in the first phase. The n-tuple data structure is designed to represent sequences and reduce the dimensionality. A more understandable and accurate method for measuring similarities among the above sequences is presented. The new similarity measure SMCS captures more specific information about sequences so that the similarity is computed more accurately. In the second phase, stratograms are employed to visualize the patterns. Stratogram provides more information, like frequency of the sequences, which helps discover and extract significant patterns.Fourth, the proposed sequential pattern mining method is expanded and used in the web usage mining. An ontology-based representation for web sessions is proposed and the corresponding semantic web session clustering and visualization method is presented. A new similarity measure for the semantic web sessions called SMSCP is defined on the semantic common paths of usersâ€™navigation. Various factors related with web usersâ€™interests are included in SMSCP. The web sessions are clustered using the improved k-medoids algorithm and the single link hierarchical algorithm separately. The stratogram are employed to visualize the clustering results. The validity of the similarity measure is verified by comparison with other similarity measures on specific dataset. The experimental results represented by stratograms also validate the effectiveness of the proposed similarity measure. The knowledge extracted from the stratograms helps make recommendation for usersâ€™navigation or optimize the web site structure for site designers.

Keywords/Search Tags:

Sequential pattern mining, Genetic algorithm clustering, Web usage mining, Web session clustering, Stratogram

PDF Full Text Request

Related items

1	Web Mining And Its Applications
2	The Research Of The Clustering Mining Based On The Web Usage Data Preprocess
3	Study On Crucial Techniques Of Web Usage Mining
4	The Research Of Web Usage Mining Algorithm Based On Web Log
5	Research On Session Clustering In Web Usage Mining
6	Mining API Usage Patterns Based On Closed Partial Order
7	The Mining Of API Usage Pattern Based On Frequent Closed Partial Order
8	Research On User Pattern Discovery In Web Usage Mining
9	Reserch On The Sequence Mining Algorithm And Its Application In User Behavior Analysis
10	The Research And Implement Of Methods On Web Usage Mining