Font Size: a A A

Research On Clustering Algorithms With Feature Preferences And Their Implementation

Posted on:2016-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:L FangFull Text:PDF
GTID:2348330479976573Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As one of the powerful tools of mining data's structural information, clustering has been extensively applied in the fields of image processing, bioinformatics, data mining, etc. Depending on whether introducing feature weight into objective function, clustering algorithms can be divided into two categories, namely the traditional clustering algorithms and feature weighted clustering algorithms. Traditional clustering algorithms(e.g. k-means and fuzzy c-means) have not distinguished the impact of data's features or data's weight on clustering, which may occasionally result in unsatisfied clustering performance due to the correlation and redundancy when dealing with high-dimensional data. Yet the prior feature weighted clustering algorithms may not necessarily meet users' expectation on relative importance or preference. Therefore, to bridge the research gap, this study attempts to propose two kinds of clustering algorithms to fit the real feature preferences of users to the maximum extent and is summarized below:1. We improved the CFP algorithm which is based on Bregman divergence by Sun et al. developed the existing clustering methods with globally-weighted cluster-independent features independent of the cluster to the one with locally-weighted cluster-dependent features, using the actual preferences given by specific users. By way of this, it will reflect the level or degree of importance every characteristic contributes to different types during the clustering. The results can avoid deficiency that the original algorithm only utilizes globally-weighted features. In the meantime, it combines with feature preference constraint in order to make the weights gained by clustering observe the prior relationship between characteristics better. Also, the strategies used in this can develop to relative clustering algorithms. Finally, I verified the feasibility of the algorithm on the experiment results of UCI dataset.2. We proposed the semi-supervised clustering algorithms combining feature information and label information, and reflected the prior information on feature facet in the form of feature preference. Different from semi-supervised clustering algorithm in normal sense, this only gives semi-supervised information on single feature facet or sample facet. Rather, the research develops previous semi-supervised clustering algorithm restricted on a single facet by combining the two kinds of information. Experiments on dataset verify it has better performance than semi-supervised clustering algorithm on a single facet...
Keywords/Search Tags:Clustering analysis, Feature preferences, Feature weighting, Cluster-dependent, Semi-supervised learning, Semi-supervised clustering
PDF Full Text Request
Related items