Font Size: a A A

Study Of Label Propagation Clustering Algorithm Based On Data Features

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330596487359Subject:EngineeringˇComputer Technology
Abstract/Summary:PDF Full Text Request
Driven by the global wave of informationization,various types of structured and semi-structured data have accumulated over time.Data mining is a tool to extract the valuable laws contained in these massive and complex data.Cluster analysis has become an important research direction in the field of data mining with its unsupervised characteristics.In this paper,we takes clustering definition,clustering process and clustering evaluation index as the starting point,expounds and analyzes the advantages and disadvantages of different types of classical algorithms,and proposes a new clustering algorithm based on the idea of label propagation.The label propagation algorithm is an efficient and simple graph-based semi-supervised learning method,but some label information needs to be provided as the initial parameter when the algorithm is executed,which leads to the reduced adaptability of the algorithm.Therefore,based on the idea of label propagation algorithm,this thesis proposes a data point density based label propagation algorithm(NDLP)and a data point importance based label propagation algorithm(NILP).The NDLP algorithm determines the initial label information by measuring the density of the data points,and then performs label aggregation and iterative update according to the initial label,thereby completing data clustering.The NILP algorithm first determines the initial label point according to the density of the data points,and then adds labels according to the importance of the data points.In the label transfer process,the corresponding label update rules are formulated according to the importance of the data points,and finally the clustering task is completed.The NDLP algorithm performs experiments on four synthetic data sets and two real data sets.In the experiment,Normalized Mutual Information and Adjusted Rand Index were selected as clustering quality evaluation criteria.Compared with the four classical clustering algorithms,the clustering evaluation index corresponding to this algorithm has obvious advantages.Firstly,the NILP algorithm selects the same experimental dataset and comparison algorithm as NDLP for validity verification,and then conducts experiments on four artificially synthesized datasets containing circular clusters,and selects the same clustering evaluation index as the original algorithm.The results show that the accuracy and efficiency of the NILP algorithm in the experiment is better than the original algorithm.
Keywords/Search Tags:cluster analysis, label propagation, data point density, data point importance
PDF Full Text Request
Related items