Font Size: a A A

Research On Semi-supervised Clustering Based On Density Peak And Gravitational Gravitational Influence Degree And Its Application

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2428330623967323Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Unsupervised clustering algorithms can explore the internal structure information of the data sample set,even unknowing the classification of those data samples.It has the capability to automatically divide the data objects into different categories according to the similarity of each data samples.The Density Peak Clustering(DPC)algorithm proposed by Rodriguez and Laio in 2014 is simple,efficient and novel.It can automatically identify the cluster center point and can deal with clustering analysis of different shapes without setting the number of clusters in advance.This provides a novel solution to the related problem of user clustering and grouping in practice.In the real engineering scenario,the data sample set might hide some known and useful information.The semi-supervised clustering algorithm is thus to get better classification results and improved resource utilization of the whole data sample set than unsupervised clustering algorithms' through the data sample point information of some known category labels and the overall distribution of more data samples without category labels.In this thesis,the DPC density peak clustering algorithm has been analyzed,in which the cluster center point is inaccurate in some cases,and the clustering error of the data sample points causes the "Domino" effect to reduce the accuracy of the clustering result when the cluster center point is automatically identified.In addition,there are a number of data sample points with category label information in the data sample set in real scenarios.This research focuses on algorithms that combines the semi-supervised clustering problem with a small amount of label information and the semi-supervised clustering algorithm in the application of tourist clustering of tourism recommendation system.The main contributions of this thesis can be summarized as follows:(1)The manual decisions may result in the inaccuracy of identification of cluster point.To solve this issue,the DPC algorithm can be designed by using small number of category label information.By fully using data sample points with already known category labels in the data sample set,calculating Euclidean distance between all possible cluster center points selected by the manual decision block diagram in theDPC algorithm,and identifying each possible distance-by-distance standard,the accurate cluster center points can be achieved by the method of voting to check and screen.(2)In the DPC algorithm,the clustering of data sample points depend on the cluster information of the neighboring points with larger local density,which leads to the defect that the "domino" effect might reduce the accuracy of clustering,and the idea that the sample object in the GSA gravity search algorithm exits the gravitation.The correlation among data samples is measured by the magnitude of gravity,which means that the gravitational force is larger,it is more possible to belong to the same cluster.The original clustering method of DPC algorithm has been changed to avoid DPC aggregation.The "Domino" effect in the class algorithm improves the accuracy of the overall clustering,and proposes a new semi-supervised clustering algorithm based on density peak clustering and gravitational influence.The experimental results on dozens of human data test sets and real data sets show that the proposed new semi-supervised clustering algorithm is effective and reasonable,and also can achieve more accurate clustering than the normal used semi-supervised clustering methods.The local density in the proposed new semi-supervised clustering algorithm is further optimized by the idea of k-nearest neighbors,so that the definition of local density looks more reasonable,and the accuracy of the overall clustering has been improved.(3)A new semi-supervised clustering algorithm is proposed to solve the clustering problem of tourists in the tourism recommendation system.It has been applied to the real clustering problem of Hainan tourists.The tourist evaluation mark information is used to improve the utilization rate of the overall resources and obtain the results of clustering and grouping of tourists.As a result,the proposed algorithm can provide targeted and reasonable suggestions for the construction of Hainan scenic spot.
Keywords/Search Tags:semi-supervised clustering, density peak clustering, gravitational search, tourist cluste
PDF Full Text Request
Related items