Font Size: a A A

Research And Application Of Clustering Algorithm Based On Web Log Mining

Posted on:2017-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:J J MaFull Text:PDF
GTID:2348330503993060Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The amount of information in the network is growing at a rapid rate with the continuous development of the Internet. The contradiction between information supply and information acquisition is more and more prominent. On the one hand, a large number of users want to skip redundant information and direct access to the desired content. On the other hand, the site operator is committed to explore the access patterns of user groups, to adjust the structure of the site, to provide personalized service and carry out the appropriate business promotion activities, so as to increase the attractiveness of the site to the user. And the user clustering can complete the classification of users according to user behavior based on web log, it provides a good solution to solve the problem.After reading and studying a large amount of relevant literature, summarized theoretical knowledge of Web log mining, data preprocessing and clustering analysis., the improved transaction identification algorithm and an improved K-Means clustering algorithm are proposed, and the improved algorithm is analyzed based on the design and implementation of the user clustering analysis system.The quality of original log data is too low to complete the clustering, Therefore, it is necessary to complete the data preprocessing and get the suitable data for clustering. On the basis of accurate data cleaning, user identification and session identification, aim at the problems such as the granularity of user session for the clustering is too coarse, as well as the original user transaction identification for page type judgment is not accurate, proposed an improved algorithm for identifying transactions, it can accurately identify the navigation page and content page, meanwhile, establishing an user access tree to get effective user transaction. And lay a solid data foundation for clustering analysis phase.Next, the classical clustering algorithm based on K-Means is studied deeply. a fuzzy partition algorithm based on density is proposed aim to solve problem of the selection of initial center points. Firstly completing the fuzzy division to obtain high density areas according to the distance, then the regions are amalgamated by the density based method, at last, the appropriate point in each high density region is taken as the initial cluster centers for a partition clustering. This can reduce the possibility of clustering only get local optimal solution, effectively reducing the number of iterations, and improve the quality of clustering.Finally, the accuracy of the improved K-Means algorithm is verified based on the classical clustering data set Iris. At the same time, the user clustering analysis system based on Web log is designed and implemented to complete data preprocessing and user clustering using real Web log data. The effectiveness of improved transaction identification algorithm and improved K-Means algorithm are verified, making theory working in practice.
Keywords/Search Tags:cluster analysis, transaction identification, K-Means, density-based fuzzy partition
PDF Full Text Request
Related items