Font Size: a A A

Research On Fuzzy Spectral Clustering Segmentation Algorithm And Apply It To Text Clustering

Posted on:2017-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y NiuFull Text:PDF
GTID:2348330503988910Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text clustering is an unsupervised machine learning method, in recent years has become a research hotspot in the field of natural language processing.The traditional clustering algorithm can only divide one sample point into one cluster.But in real life, the attribution of many things is not so clear, The emergence of Fuzzy Clustering algorithm provides a new way to solve these problems. It can realize the fuzzy partition of sample set. Fuzzy C-means(FCM) algorithm is the most widely used in fuzzy clustering. FCM is a kind of segmentation clustering algorithm. It has the common shortcomings of segmentation clustering algorithm, such as the algorithm is easy to obtain the local optimal solution, sensitive to noise data, need to specify the number of clusters in advance and so on.Spectral clustering(SC) algorithm can be used to cluster the sample space of any shape, and get the global optimal solution. In this paper, by combining the SC algorithm with the FCM algorithm, we presents a fuzzy spectral clustering segmentation algorithm. This algorithm can realize the clustering of sample space with arbitrary shape. By analyzing the membership degree of the noise points in the FSC, the problem of the normalized constraint condition of the membership degree is found. And puts forward the method of improving the membership degree constraint conditions. This improved method is used in the FSC algorithm, and then obtain the improved membership degree of fuzzy spectral clustering segmentation algorithm(IMD-FSC), it solved the problem of noise data. Based on the relationship between the eigengap of Laplasse matrix and the number of clusters, a method of automatically determining the number of clusters is proposed, this method can be used in the IMD-FSC algorithm, and the adaptive fuzzy spectral clustering(AIMD-FSC) algorithm is proposed. The automatic determination of the number of clusters is realized.This paper gives a detailed experimental procedure, the above three methods were applied to the text clustering, achieve the fuzzy division of text collection. The results were analyzed by using Precision and Recall. Experimental results show that the clustering effect of AIMD-FSC algorithm proposed in this paper has been greatly improved, and it has a certain application value.
Keywords/Search Tags:document clustering, fuzzy spectral clustering, noise data, automatically determine the number of clusters
PDF Full Text Request
Related items