Font Size: a A A

Research And Application Of Clustering Algorithm Based Web Log Mining

Posted on:2012-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhuangFull Text:PDF
GTID:2178330332985813Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is to abstract information or mine knowledge from great amount of data, the abstracted information and knowledge can be applied to various fields. Among the data mining techniques, clustering is the most popular one, clustering analysis has become a hot research field in data mining area.Applying clustering to Web log mining can help to find the user access patterns such as user access rate and user clustering from the logs where user browsing behaviors are recorded. The knowledge can assist in optimizing website topology, providing personalized or intelligent services to improve the performance of website.The paper initializes from the basic clustering algorithms. The hierarchy clustering algorithm, k-means clustering algorithm and fuzzy C-means clustering algorithm are analyzed and implemented. The problem of clustering number and initial clustering center is discussed. The traditional clustering algorithm is improved with the consideration of clustering number and initial center. A comparative study is given on the presented improved algorithm to verify the effectiveness of the algorithm. Also, the improved algorithm is applied on the Web log obtained from quality course web site of Donghua University to realize the clustering analysis. The paper mainly focuses on the tasks below:1) On the basis of analysis and implementation of basic clustering algorithms, a comparative study is conducted on these algorithms with standard dataset. The clustering results of hierarchy clustering, k-means clustering and fuzzy C-means clustering are compared.2) With respect to the initial clustering center problem and number of clustering problem, an improved algorithm is presented to give an optimal solution. Analysis on estimation of fuzzy C-means clustering number and Pearson correlation coefficient distance measure method is presented. On the basis above, a rough set based improved fuzzy C-means clustering algorithm is put forward and implemented. A comparative study is given on the presented improved algorithm and traditional algorithm to verify the effectiveness of the rough set based improved fuzzy C-means clustering algorithm.3) The improved clustering algorithm is applied on the Web log mining to get clustering result for analysis and research. It is applied on the Web logs obtained from quality course web site of Donghua University to find the features of user browse behavior in order to give a website optimization suggestion.
Keywords/Search Tags:clustering analysis, fuzzy C-means clusering, rough sets, Web log mining
PDF Full Text Request
Related items