Font Size: a A A

Research Of Web User Clustering Model Based On Genetic Algorithm

Posted on:2009-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiuFull Text:PDF
GTID:2178360245979831Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web log mining is an important component of Web mining, including a large number of users'accessing information, and we can find the user behaviour patterns by analyzing it. So research on it is of great significance in theory and practice. There are three major ways for Web log mining: Clustering Analysis, Association Analysis, and Sequence Analysis. Here Clustering Analysis is suitable for mining noise and incomplete data sets, for this reason it plays critical role in analysis of user behavior model.K-means algorithm is the most widespread method in clustering analysis. However its vital shortcoming is the sensibility to initial value and it is easy to run into a local optimum. Therefor by introducing genetic algorithm, integration of k-means algorithm and genetic algorithm can bring the computing advantages of genetic algorithm's heuristic global optimization into full play to get optimal clustering.Firstly, the system discussed in this paper uses an encoded mode for web pages which established by the web site's topological structure. With this code, page hierarchy and dependency relationship are stored. The method can help promote quality of web users clustering. Then a group of user behaviour accessing vector based on the code is built from Web logs, and Web User Clustering Model Based on Genetic Algorithm (WUGC) is proposed to improve the Web user clustering.Making use of the individuals'selection, crossover and mutation operator of Genetic Algorithm, in clustering process, the individual that has the higher fitness are retained and evolved till the optimal result is found. This model has no demand of the selection of initial clustering center as well as the order of the samples input. Thus disadvantages of the k-means algorithm, sensitive to initial values and easy to run into local minima, are avoided.Finally an experimental platform is designed, which realizes both k-means algorithm and WUGC to classify the same Web user data. Further more, the experimental results are compared and analyzed. Consequently result shows that WUGC is more effective than traditional k-means algorithm in clustering quality, while the speed is slower than k-means algorithm because of the use of genetic operation.
Keywords/Search Tags:Clustering, K-means Algorithm, Web User Clustering, Genetic Algorithm
PDF Full Text Request
Related items