Font Size: a A A

Research On Practical Hierarchical Clustering Algorithm And Its Applications

Posted on:2018-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ZhangFull Text:PDF
GTID:2348330518476637Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As a product of social informatization and data explosion,data mining has the function of discovering the implicit knowledge in the data and effectively extracting useful information from the chaotic data.Clustering analysis is one of the important methods of data mining,which can effectively find the inherent attributes of the data itself using unsupervised algorithm.With the deepening of clustering analysis,clustering analysis has been widely accepted both in theoretical research and in practical application,according to the existing research.The clustering analysis method mainly includes partitioning method,hierarchical clustering,density-based clustering,model-based method,and other clustering methods.But each kind of clustering method has its specific application domain,no one can adapt to a variety of data types and application areas.Aiming at this situation,a practical hierarchical clustering algorithm which can be applied to a variety of data types and applications is proposed.Experimental analysis and practical justify the validity of the algorithm.The main work and achievements of this paper can be summarized as follows:1.The typical methods of clustering analysis and these advantages and disadvantages are analyzed,and the clustering of K-means and DBSCAN(Density-Based Spatial Clustering of Applications)are mainly focused on.At last,this paper analyses the requirements of clustering algorithm and evaluates whether an algorithm is effective from multiple angles.2.A practical hierarchical clustering algorithm is proposed aiming at shortcomings of K-means algorithm and DBSCAN algorithm.The idea of data competition and the weight of the link are introduced,the clustering process is divided into small clusters and the merging of small clusters.The existence and increasing criterion of the link weight guarantee the rationality of the small cluster.3.The simulations justify the validity of the algorithm through five data sets from various angles,including clustering accuracy,clustering time,the ability to handle complex data types such as convex and non-convex,as well as external criteria for evaluating effective clustering.The simulations also indicate that the proposed algorithm can solve the clustering problem of high dimensional data such as iris.4.Combining clustering algorithm with PCA,PCA-hierarchical clustering is proposed.Then,two examples of coronary heart disease data and hepatitis pathology data demonstrate the effectiveness of proposed practical hierarchical clustering algorithm and PCA-hierarchical clustering.The results show that the practical hierarchical clustering algorithm and PCA-hierarchical clustering can find the small cluster type implied in the pathological data,and have the function of knowledge discovery.
Keywords/Search Tags:Data mining, hierarchical clustering, PCA, pathology data, knowledge discovery
PDF Full Text Request
Related items