Font Size: a A A

Research On Clustering Ensemble Based On Feature Relationships

Posted on:2018-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z L JiangFull Text:PDF
GTID:2348330512987157Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the machine learning methods of high value in applications,which is primarily used to for partitioning the data set into groups with distinguishable boundaries.Due to the sensitivity of the strategies to the nature of the data itself,there exists no general methods that can deal with any kind of data set.To solve this problem,many researchers have proposed and studied the use of ensemble learning to improve clustering analysis,and achieved good results.However,most of the researches have been directed to the implementation of ensemble learning algorithms,and little attention was paid on the data itself.Indeed,in the field of machine learning,the nature of the data itself will have a great impact on the final learning quality,especially when the dataset possesses a lot of features and complicated correlations,the feature engineering can significantly enhance the quality of machine learning.Therefore,in the thesis,we study to explore on clustering ensemble based on the data's features in the following aspects:1.For the clustering-member-generating step during the ensemble process,we propose a method of generating subsets of features to reduce the correlations among features,improve the diversity of clustering member and gain better clustering quality.2.For the clustering-member-merging step during the ensemble process,we propose four kinds of weight calculation methods to evaluate the quality of clustering members.These methods produce better clustering results that fit in with the own nature of datasets.3.For the clustering ensemble strategy implemented by iterative optimization,we study a traditional Boosting-based clustering ensemble method.Through a deep analysis on the difficulties of applying the Boosting method in the clustering analysis,we investigate how to achieve an improvement from the data feature's point view,which is more stable in judging the clustering quality of the data and has better time performance than the traditional method.
Keywords/Search Tags:Machine learning, Clustering analysis, Ensemble learning, Feature engineering, Boosting
PDF Full Text Request
Related items