Font Size: a A A

The Improvement Of The Hierarchical Clustering Algorithm

Posted on:2015-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2348330518470349Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Cluster analysis as an important field of data mining technology is widely used in many fields such as medical diagnostics, image processing, information retrieval, data compression,and machine vision, its prospects attract more and more people's attention. In recent years with the advent of the information age, the amount of data collected by people in contact with more and more information and the hidden information becomes more and more complex, so people are eager to find a simple and efficient tool for clustering algorithm data extraction and analysis, analysis of algorithms for clustering also has important practical significance.In the current types of clustering algorithms, hierarchical clustering with its principle and simple logic principle and accurate clustering results, get people's widely used, but due to the need of its iterative calculation, resulting in a higher time complexity and space complexity, it does not apply to large-scale data processing. In addition, the algorithm of outlier data is sensitive and can not be mixed with a good deal in outlier data sets. In this paper, based on hierarchical clustering algorithm for these deficiencies, mainly in the following research areas:(1) In order to solve the shortcoming of hierarchical clustering algorithm that is sensitive to outliers, an improved algorithm which is combined with energy field named EFHC is proposed . The algorithm will be introduced to the gravitational field, the concept of outlier detection, the data for each point as the gravitational field point which has a certain energy value of the energy, according to the different field energy between the data points and the isolated points, the algorithm can identify the appearance of the isolated pointsisolated which have lower energy, and remove them, the method can effectively remove the isolated points in data set and improve the clustering accuracy.(2) In order to solve the shortcoming of the hierarchical clustering algorithm that i s the computational complexity is too high, a new algorithm which combined with dat a segmentation named DHC is proposed. Large-scale data block is divided into small data and then to cluster the small data one by one, getting the representative of infor mation of each data, and then integrate processing to cluster the original . By contrast experiments on UCI data sets and synthetic data sets machine point of view, the ne w algorithm greatly reduced the time complexity compared to the original algorithm ,and the clustering result is obtained corresponding improvement.
Keywords/Search Tags:Cluster analysis communication, Hierarchical clustering technology, Data segmentation, Field energy, Isolated points
PDF Full Text Request
Related items