| Clustering analysis,which belongs to the unsupervised learning method,is one of the main directions in data mining.As part of the clustering analysis,hierarchical clustering can not only process noisy data and isolated point effectively,but also not be influenced by the initial value.It has been widely used in biomedicine,disease diagnosis and classification.A key problem of hierarchical clustering is the measurement of distance matrix.Different from the Euclidean distance,mutual information can measure the nonlinear relations effectively.However,the high-dimensional and small-sampled characteristics of biomedical data sets result in a large deviation of traditional mutual information estimation methods,thus affecting the validity of mutual information clustering.In order to estimate the mutual information accurately and improve the performance of hierarchical clustering in high-dimensional and small-sampled biomedical data sets,on the one hand,the paper adopts a Grassberger estimator for small sized data sets and deduces a mutual information estimation method based on Grassberger estimator.On the other hand,an improved mutual information hierarchical clustering algorithm which is called mutual information hierarchical clustering based on Grassberger entropy estimator(G-MIHC)is proposed in the paper.In the paper,the effectiveness of the proposed algorithm is verified in four biomedical data sets with high-dimensional and small-sampled characteristics.The experimental results show that the proposed algorithm performs better than K-means,agglomerated hierarchical clustering based on Euclidean distance,traditional mutual information hierarchical clustering based on Na?ve estimator and Miller-adjust estimator,thus can solve the problems above to some extent. |