Font Size: a A A

Research On Clustering Algorithm Based On Multiple Classifiers Combination

Posted on:2008-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2178360215983884Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the network and database technology, Data mining is produced. Clustering is one of the important tasks in the field of data mining and is an important method of data partition or grouping. Clustering has been used in various ways in commerce, market analysis, biology, Web, classification and so on.Because of the complexity of mixed dataset, the traditional clustering algorithms which are appropriate for this dataset are a few and the effect of clustering is not well also. In addition, setting of cluster number is always a difficult problem in clustering. Recently with the ensemble learning is successfully used in the field of classification and prediction, the mature combining multiple classifiers technology is formed. Because the prior information of data sets in clustering is unknown, so the researches of clustering ensemble are concerned only in recent years. There are many problems worthy of being researched deeply in this field. So far, the ensemble mode of clustering ensemble algorithms are most parallel-connection, it need to match and integrate the cluster result of component clusterer, so the time complexity is high. The problem of cluster number setting also exists. Especially the cluster number of component clusterer and the finally cluster number as well as the relation between them are more difficult to decide.For the above problem, we reference the combing multiple classifiers technology, use k-prototype as the basic cluster algorithm to design a multi-hierarchical clustering ensemble algorithm in this thesis. It adapts to cluster mixed datasets. Its ensemble mode is series-connection, so it could avoid the process of matching and integrating. And it only needs to set an estimated value of the cluster number. With the adding of cluster layer, it will self-adapt adjust the cluster number. We use the standard dataset of UCI to test. Through the result of the experiments, we prove the cluster precision of this algorithm is high. It improves the effect of mixed dataset clearly. Its time complexity is lower, and the scalability is good. This algorithm also has the ability of classification and prediction.
Keywords/Search Tags:Data Mining, Clustering, Multiple Classifiers Combination, Clustering ensemble
PDF Full Text Request
Related items