Font Size: a A A

Research On User Visit Interest Based On Web Log Mining

Posted on:2015-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhaoFull Text:PDF
GTID:2298330452450795Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, the amount ofinformation on the Internet has reached an unprecedented scale. People can get anyinformation they want whether from the computer or mobile phone. How to get moreuseful information quickly and accurately from the massive data and how to explorethe potential valuable knowledge and patterns to make the Internet more intelligent sothat people can get better Internet experience has become a serious problem in theInternet era. In this context Web data mining emerged as one of the effective ways tosolve this problem.There are three areas in web data mining including web content mining, webstructure mining and web log mining. The main background of this thesis is web logmining. Since the web log data is of high-dimensional, massive, semi-structured orunstructured characteristics, traditional data mining algorithms can not meet theperformance requirements. So the particle swarm algorithm of the swarm intelligenceis applied to the user clustering. Studies have shown that the algorithm has betterperformance on high-dimensional data than tradition clustering algorithms.This thesis firstly researches on the basic principles of classic cluster algorithmand Particle Swarm Optimization (PSO) algorithm. And then analyzes and comparesthe advantages and disadvantages between several classic user clustering algorithmsand particle swarm clustering algorithm. Secondly, for the problems of existingclustering algorithm such as easy to fall into local optimum result and instability onhigh-dimensional data, an improved PSO algorithm based on K-means is proposed.By defining the divergence to determine the timing of K-means algorithm operation,the new algorithm makes full use of the local search capability of K-means and theglobal search capability of PSO to accelerate the convergence speed and also improvethe results accuracy. Thirdly, the thesis introduces the concept of fitness variance tomake inertia weight in particle swarm algorithm adjust itself adaptively andnonlinearly with the fitness variance. In order to avoid the degradation caused byrandom search, mutation operation in a certain probability is added to reduce thepossibility of clustering falling into local optimal solution prematurely. Fourthly, inspired by the idea of divide and rule, the thesis constructs a hierarchical web logmining scheme. After collecting, cleaning, transaction identifying and featureextracting on the web log data, the improved hybrid algorithm proposed in the thesisis used to cluster user data, then association rules is applied to mining user accesspatterns, resulting reduced size and complexity of association rules. Finally, theexperiment results demonstrate that for dealing with high-dimensional web logdatasets, the improved algorithm has advantages such as high clustering correctness,fewer iterations, stable performance, and so on. The improved algorithm canefficiently dig out the obvious user access interests.
Keywords/Search Tags:Web log mining, User clustering, Particle swarm algorithm, Adaptive, K-means
PDF Full Text Request
Related items