Font Size: a A A

The Study Of Clustering Based On Unsupervised Decision Tree

Posted on:2011-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:2178360308954331Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Decision tree is a supervised inductive learning algorithm,and it used to classify the data set with labels. However, clustering is an unsupervised learning algorithm. It can divide the data set without labels into smaller groups. So that the similarity of data within the group is very large, and the similarity of data between groups is very small. Since the 21st century, more and more scholars focus on the integration of two methods.In this thesis, we propose a method named new-style clustering based on unsupervised decision tree (NCUDT). This method can divide the date set into cluster by studying data set without label. In fact, the process of clustering is a process of building an unsupervised decision tree. At each node of the unsupervised decision tree, we select an attribute in such a way that the disperse or the inhomogeneity of the data set is maximum with respect to that attribute. Partition the data set into segment at each node by improved valley detection. The algorithm provides stop criteria for restricting the growth of unsupervised decision tree. Finally, the leaf nodes of generated decision tree are the cluster by clustering algorithm.The experiments show that the method of NCUDT has better classification accuracy than C4.5 and k-means. Besides, the size of the tree constructed by NCUDT is smaller than the size of tree constructed by C4.5. At last, we analyze the time complexity of this algorithm, and compare with other methods show that NCUDT have higher efficiency.
Keywords/Search Tags:Classification, Decision tree, Clustering, Unsupervised learning
PDF Full Text Request
Related items