Font Size: a A A

Applications And Research On Possibilistic Fuzzy Kernel Clustering Algorithm Based On Sample-feature Weighted

Posted on:2014-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiuFull Text:PDF
GTID:2268330422452280Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is a kind of the multivariate statistical analysis, and it’s also animportant branch of the unsupervised pattern recognition. The purpose of clustering is makingthe distance between the similar samples as small as possible, but the distance betweensimilar samples as large as possible. With the continuous development of fuzzy set theory, thefuzzy clustering analysis has become the mainstream of the clustering analysis. Among them,fuzzy C-means (Fuzzy C-Means, FCM) algorithm based on the objective function theory isthe most complete, most widely used. Now the FCM algorithm has been widely used inpattern recognition, data mining and other fields.Text mining is an important area of data mining research. During text processing, itneeds to convert the text data from the unstructured form to a structured form that can beprocessed directly by the computer. At present, it’s difficult for the computer to understand theproblems of semantics ambiguity in the natural human language, therefore, to achieve betterdata mining results that meet the practical needs of people; it needs to combine with otheraspects of knowledge to conduct more in-depth exploration and research on text mining. Butthere are still many problems in text mining for FCM algorithm.This article firstly makes analysis and comparison of the experimental simulation offuzzy C-means algorithm, possibilistic C-means algorithm and possibilistic fuzzy C-meansalgorithm; Secondly, according to the shortage of FCM algorithm there makes the followingimprovements:(1) For the traditional fuzzy C-means clustering algorithm is more sensitive tothe initial cluster centers,running the FCM algorithm and using its’ last clustering center asthe initial clustering center of the new algorithm to avoid the problem;(2) Fuzzy C-meansalgorithm does not consider the different sample sets to the clustering results,but in thepractical application it makes different contributions to clustering, that is, different samplesets have varying degrees of impact on the clustering. For this problem, the samplemembership has been optimized and introducing a sample weights to consider the samples’effects of clustering, and making experimental simulation;(3) For the classic fuzzy C-meansclustering algorithm is a noise-data-sensitive algorithm which does not take the imbalancesamong characteristics of samples into consideration, it proposes a possibilistic fuzzy kernelclustering algorithm based on sample-feature weighting. The possibilistic clustering has beenapplied to fuzzy clustering and combining with fuzzy C-Means to calculate dynamically thesample weights and feature weights in the clustering process, and by using the kernel function, the linearly inseparable data in low-dimensional feature space is mapped to the data that canbe divided in the high-dimensional feature space in order to improve the clustering accuracyand noise immunity. At the same time, the new algorithm is compare to the fuzzy C-meansalgorithm, the possibilistic C-means algorithm and the possibilistic fuzzy C-means clusteringalgorithm in experiments on UCI data sets and X12data sets that contain noise and artificialdatasets, validating and analyzing the clustering accuracy and noise immunity of the newalgorithm.
Keywords/Search Tags:sample-weighted, feature-weighted, fuzzy clustering, kernel, Fuzzy C-Means, possibilistic fuzzy clustering, text clustering
PDF Full Text Request
Related items