Font Size: a A A

Research And Implementation Of User Behavior Time Series Clustering

Posted on:2020-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZengFull Text:PDF
GTID:2428330572973687Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Behavioral analysis of Internet users over time is a hot spot in user behavior analysis in recent years,usually clustering users is a way to find the feature of user behavior.Problems like poor computing performance or inaccurate distance metric exist in present research about clustering user time series data,which is unable to deal with large scale data.In order to solve the above problems,this paper research on the user behavior time series clustering method based on symmetric KL distance.And then using distributed computing and MapReduce programming model to further improve the computational efficiency of clustering operations,and finally realize automatic clustering of time user behavior time series.The main research contents of this thesis include:(1)Based on the existing time series clustering research,based on the characteristics of network user time series data,a KL clustering based user behavior time series clustering method is proposed.The KL distance describes the difference of the object in the probability distribution.It can adapt to deformation such as panning and scaling of data,and get rid of the traditional distance definition in geometry.Improve the accuracy of description of time distribution differences.The experimental results show that the proposed algorithm can improve the accuracy of 4%compared with the distance algorithm using Euclidean and DTW(Dynamic Time Warping),compared with the clustering algorithm using medoids cluster centroid,the calculation time is reduced by an order of magnitude;(2)For the real network environment,achieving distributed clustering of massive user behavior time series.Based on the distributed platform,the MapReduce implementation of key processes such as network user behavior data serialization,optimal cluster number automatic ginseng,and cluster mapping relationship characterization is designed and implemented.The results show that the implemented module can meet the calculation under large data volume.Demand,greatly reducing the time required for clustering calculations.In a word,this paper studies and implements the time series clustering technology based on user behavior,which solves the problem of difficulty in selecting the center of mass in existing research,low accuracy of similarity measurement and large computational complexity,and provides an analysis of network user behavior.Extracting the characteristics of user groups through time series clustering can provide basis for subsequent network service optimization.
Keywords/Search Tags:time series clustering, user analysis, Kullback-Leibler distance, MapReduce, Hadoop
PDF Full Text Request
Related items