Font Size: a A A

Research On Robustness And Cluster Size Sensitivity Of Fuzzy C-means Clustering

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:P GaoFull Text:PDF
GTID:2518306524958889Subject:Traffic and Transportation Engineering
Abstract/Summary:PDF Full Text Request
The main content of this thesis is the application of fuzzy clustering algorithm in data sets containing noise and unbalanced data sets.With the continuous development of science and technology,cluster analysis has shown its strong vitality in many fields,such as data mining,image segmentation,pattern recognition,drug analysis,machine learning and so on.According to the clustering rules of data,clustering analysis can be roughly divided into partition-based clustering,hierarchical clustering,density-based clustering and mesh-based clustering,among which partition-based clustering algorithm has been widely used because of its simple mathematical model and easy to explain advantages.However,the algorithm itself also has defects,which will lead to poor robustness of the algorithm to the data set containing noise.In addition,the algorithm is also sensitive to the variance and data capacity of the data set,so it cannot handle the unbalanced data set well.This thesis has carried out detailed research and analysis on these problems,and put forward some of its own solutions on the basis of others' research.The specific work is as follows:1.Most data in real life have noise points and outliers,and the existence of these abnormal data will affect the clustering effect.However,there is no clear definition of the definition of noise in clustering analysis.Then,the influence of different location data points on clustering results is analyzed in the clustering process.2.Most data in real life are not balanced,such as network fraud links,normal data and abnormal data in case analysis,etc.Fuzzy clustering is sensitive to such data and usually fails to obtain good clustering effect.In this part,we analyze the influence of unbalanced data sets on clustering results from the perspective of data disequilibrium degree.3.In view of the above problems existing in the algorithm,we analyzed the reliability of data points based on the current clustering results,and proposed a robust fuzzy clustering algorithm based on reliability(RRFCM).Compared with the traditional fuzzy clustering algorithm,the algorithm only considers the relationship between data points and the clustering center,introduces the local nearest neighbor constraint term,and also considers the relationship between data points and surrounding data points.Compared with the corresponding algorithm,the algorithm has achieved good clustering effect on both artificial data sets and real data sets.4.Aiming at the existence of outliers point,they will affect the stable convergence value of the objective function,so we also put forward a kind of based on L2,p norm robust fuzzy clustering algorithm,through the introduction of p values,distance to the data points larger punishment,to reduce the pulling force of them to other data points,thus reducing their impact.5.In addition,the complexity of the two proposed algorithms is analyzed,and the convergence of the second algorithm is analyzed because of the introduction of P norm.
Keywords/Search Tags:Fuzzy Clustering(FCM), Robustness, imbalanced data Set, Image Segmentation, L2p norm
PDF Full Text Request
Related items