Font Size: a A A

The Research And Application Of Travel-time Based Potential Clustering

Posted on:2020-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:S T LuFull Text:PDF
GTID:2428330578964284Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In 2016,a new potential clustering algorithm called TTHC(Travel-Time based Hierarchical Clustering)is proposed by Lu et al.Based on the analysis of the potential values of the data points,the TTHC algorithm adopts a new travel-time based similarity metric,which is convenient and fast,and can obtain better clustering effect and clustering accuracy.Although it has a good clustering effect,the algorithm needs to manually set the number of clusters,and only allocates samples based on similarity,ignoring the influence of distance and potential values;the algorithm can not identify the noise data points existing in the datasets,resulting in poor clustering effect of the algorithm.The above problems affect the application of the TTHC algorithm.This article makes improvements to the above shortcomings of the algorithm.ACTT algorithm and APCTT algorithm are proposed in this article,and then the ACTT algorithm and APCTT algorithm are applied to the text clustering.The detailed content of this article contains the following aspects:(1)Because TTHC algorithm manually sets the number of cluster centers and creates problems in assigning categories to data points,a new travel-time based clustering algorithm is proposed to automatically determine the clustering center.First of all,the new algorithm calculates the potential values of each data point and similarity between data points,and then determines the parent node of the data point according to similarity,so we can obtain the distance from the parent node.Secondly,according to the similarity and distance between data points and parent nodes and the potential values of data points,the comprehensive consideration value is obtained.The clustering center is automatically determined according to the comprehensive consideration value.Finally,the remaining data points are assigned to clusters whose potential values are smaller and similarity is the largest,so a clustering result is obtained.Through theoretical comparison and experimental comparison,the new algorithm not only can automatically determine the centers of the clusters,but also improve the distribution mechanism of the sample,and the clustering result is better.(2)Because TTHC algorithm can not identify the noise points in the datasets and the clustering result is susceptible to the noise points,a new algorithm that can identify the noise points is proposed.Firstly,the new algorithm calculates the potential values of each data point and similarity between data points,and then determines the parent node of the data point according to similarity,so we can obtain the distance from the parent node.Secondly,according to the distance between data points and parent nodes and the potential values of data points,the ? value is obtained.An increase curve is constructed according to the ? value.Identify the noise points by finding the inflection points in the increase curve,and classify the noise data points into a new cluster.Finally,after removing the noise points,the dataset is hierarchically clustered according to the distance between the data points and parent nodes,and a clustering result is obtained.Through the comparison and analysis of the theoretical derivation and experimental results,the new algorithm can identify the noise data points existing in the datasets,so that better clustering results and clustering accuracy can be obtained.(3)The ACTT algorithm and the APCTT algorithm are applied to the microblog text clustering.Firstly,the microblog text data is collected,then the text data is preprocessed,and then the features are extracted and the weights are calculated.Finally,the weight matrix after dimensional reduction by PCA is clustered.Finally,the clustering effect of different algorithms is analyzed and compared.The application value of the ACTT algorithm and APCTT algorithm in this paper is proved.
Keywords/Search Tags:potential clustering, hierarchical clustering, travel-time, cluster center, increasing curve of ?, noise recognition, microblog text clustering
PDF Full Text Request
Related items