Font Size: a A A

Application Of Improved Clustering Algorithm Based On Hadoop In Web Log Clustering

Posted on:2019-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y NingFull Text:PDF
GTID:2428330563990968Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the development of network technologies,the number of netizens has increased gradually,and the number of Web log has also grown.How to efficiently dig out hidden business information from these Web log and provide users with high quality service have became an important research direction.The content of this article has the following aspects:1)Improved PSO-KMeans algorithm based on simulated annealing mechanism and roulette strategy.The traditional PSO algorithm is easily trapped in a local optimum.For this problem,this paper introduces simulated annealing and roulette strategy to improve the traditional PSO algorithm and select the initial cluster centers of K-Means algorithm through the improved algorithm which can improve the stability and accuracy of the algorithm.2)Improved FCM algorithm based on BAS.The traditional FCM algorithm is easily affected by the initial cluster centers and converges to the local optimum.In order to solve this problem,this paper adopts a logistic model to change the step size of the BAS algorithm and introduces the BAS into the FCM to control the updating of the cluster centers.Through the ability of BAS global optimization,achieves the goal of optimization.3)Research on web log clustering system based on Hadoop.This paper mainly includes Hadoop based Web log preprocessing and Web log clustering.Web log clustering is divided into two parts: One is to combine the improved PSO-KMeans algorithm with the Hadoop platform.The second is to apply the improved FCM algorithm based on cuckoo to Web log clustering.This paper optimizes Web log preprocessing,PSO-KMeans algorithm and FCM algorithm.Through experimental comparison,the optimized Web log mining algorithm has greatly improved both in terms of efficiency and stability.The improved PSO-KMeans algorithm is more efficient than the BAS-FCM algorithm,but the accuracy of the BASFCM algorithm is better than that the improved PSO-KMeans algorithm.Therefore,users can select algorithm with requirements.
Keywords/Search Tags:Web Log Clustering, Hadoop, PSO-KMeans, FCM, BAS
PDF Full Text Request
Related items