Font Size: a A A

Research On Selective Clustering Ensemble Algorithm Based On Fractal Dimension

Posted on:2016-06-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X WuFull Text:PDF
GTID:1108330473461660Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Clustering is a process of categorizing data objects, which makes similarity of data objects within the same class as large as possible and between different classes as small as possible. Clustering is regarded as unsupervised learning. Clustering analysis in data mining, pattern recognition, statistics, and many other fields has a broad application prospect and has always been a hot research topic in the field of machine learning. For the specific data, how to choose the suitable clustering algorithm is always the key point of the research. Clustering ensemble combines the clustering results using a consensus function, in order to attain the purpose of sharing information to maximize existing clustering results, which can provide more accurate and stable mining results than single clustering algorithm. This dissertation proposed an algorithm based on fractal dimension of the clustering ensemble based on the idea which put clustering ensemble algorithm combined with fractal data mining. In the big data environment, this dissertation will expand the above algorithm to implement in cloud computing environment.Traditionally, all of the available clustering solutions are combined together to produce the final clustering ensemble result, which makes some inferior quality clustering results involved the fusion process, so that interfere with the accuracy of the integration and reduces the final quality of the clustering results. In supervised learning, the selective classification ensemble algorithm can get better results. So the selective ensemble thought was introduced to the clustering ensemble which obtains the inspiration from the "selective classification ensemble" and get the selective clustering ensemble algorithm. The fractal data mining technology was introduced after the clustering members produce, and the fractal dimension and projection clustering algorithm is proposed in this dissertation based on fractal dimension of selective clustering ensemble algorithm, in order to improve the accuracy of the clustering members.In this dissertation, the research content and innovation are as follows:(1) In view of the traditional K-means clustering algorithm suitable for find spherical data set cluster, this dissertation proposes a clustering ensemble algorithm based on fractal dimension. It will ensemble thought combined with single fractal clustering, compared with the single fractal clustering algorithm, which can improve the accuracy of clustering results and can discover arbitrary shape and distance of adjacent clustering.(2) The traditional clustering algorithms are not fit for deal with mass and high dimensional data in practical application. In view of the cloud computing environment, the use of cluster system parallel computing ability, to realize mass data clustering problems, a fractal dimension clustering ensemble algorithm based on cloud computing environment is proposed in this dissertation.(3) Traditional clustering ensemble algorithm can’t eliminate the inferior quality cluster members’ influence and also characterized with lower clustering accuracy. In order to solve these problems, this dissertation proposed an algorithm based on fractal dimension of the selective clustering ensemble. Firstly, we used the algorithm based on fractal dimension of the clustering to realize incremental clustering, based on its suitability for arbitrary shape clustering; then according to the selection strategy, based on normalized mutual information, the algorithm selected high quality cluster members to realize integration using weighted Co-association matrix and get the final clustering results. A comparison was done between this algorithm and the traditional clustering ensemble algorithm by conducting experiments, which confirmed that the new algorithm improved the clustering quality for good extensibility.(4) For high dimensional data clustering, in this dissertation, a novel selective clustering ensemble algorithm based on fractal dimension and projection is proposed. Firstly, the clustering members are generated by the clustering algorithm based on fractal dimension and projection to realize dimension reduction and clustering; then the selection strategy best on the best reference partition is used to produce the components of the ensemble system in order to pick part of high quality cluster members to realize ensemble by using weighted Co-association matrix, and get the final clustering results at last. The experimental results on UCI data set verify the validity of the proposed algorithm for dealing with high dimensional data clustering. The new algorithm is able to achieve statistically significant performance improvement over other clustering algorithms.(5) For the need of project, the selective clustering ensemble algorithm is applied to meteorological data based on the above research contents. According to the clustering results based on meteorological data mining to realize the national climatic regionalization.
Keywords/Search Tags:fractal dimension, reference partition, Selection strategy, Co-association matrix, clustering ensemble, selective clustering ensemble, cloud computing, Hadoop
PDF Full Text Request
Related items