Font Size: a A A

Parallel Distributed Web Access Patterns Clustering

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:X L JiaFull Text:PDF
GTID:2428330602457456Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,data is growing exponentially due to widely resource sharing.In addition,people has a growing demand for personalized web services and intelligent recommendation.Therefore,how to obtain users' interests through web mining is crucial.Web usage mining analyze users' interest preferences and behavior patterns through using the information in Web pages,improving the quality of recommendation system.By analyzing users' access behaviors,web clustering can cluster users with similar browsing patterns into one category,and it can provide users with personalized services more accurately.In the traditional clustering algorithms,the boundaries between classes are definite,but the division of classification is fuzzy in real life.So fuzzy clustering methods have been widely used in real life,and they are the main trend of clustering analysis study.At present,most web log mining methods are based on access frequency,but the information mined by these methods is of little value.The two clustering methods proposed in this paper are based on access time.In this paper,fuzzy vectors are used to represent the users' browsing patterns,which record whether users have browsed the page and the time they stay.Based on fuzzy rough k-means clustering algorithm,this paper proposes two improved algorithms.This thesis does the main work as follows.Firstly,a two-layer clustering technique is proposed based on the fuzzy rough k-means and Angle cosine,which overcome the drawbacks of the slow convergence speed of the fuzzy rough k-means algorithm.The feasibility of the clustering method is demonstrated by a series of experiments.The result of the clustering method is verified by using the Davies-Bouldin index and is compared with other clustering algorithms.Secondly,since the initial cluster number needs artificially set,and the initial cluster center is random,the clustering result is unstable in fuzzy rough k-means clustering algorithm.Therefore,the algorithm is improved on cluster number and cluster center.First,the betternumber of clusters is determined by Angle cosine value.Then,the initial clustering centers are optimized by the similarity of the angle cosine.The experiment results show that the iterative times are reduced,and clustering efficiency is improved.Finally,when the data sets are too large,the two improved algorithms are inefficient.Therefore,we use MapReduce to realize the parallelism of the two improved algorithms.The experiment results show that the two algorithms have good scalability and speedup.
Keywords/Search Tags:Web mining, Fuzzy rough clustering, Web access patterns, Angle cosine, Parallel
PDF Full Text Request
Related items