Font Size: a A A

K-means Based On Binary And Svm Decision Tree Algorithm Of Data Mining Research

Posted on:2013-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2248330374461969Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Using computer technology for data acquiring and processing is an extremely important application of IT technology. We will meet much high-dimensional data information which contains a large number of features in the process of data collection. These high-dimensional data caused "curse of dimensionality" in the process of data mining. Therefore, the analysis and study of algorithms for effective data mining in high-dimensional datasets is very important. In this thesis, the research focus and current situation of relevant research, theoretical methods were discussed based on dimensional classification and the existing algorithms of data mining, and we carry out a further research on this basis. The major works of this thesis are as follows:(1) We discussed the disability of the existing algorithms for data mining in high-dimensional datasets and the importance of adaptive dimensionality reduction algorithms. Firstly, we introduce the problems of the high-dimensional classification and the research focus, and specially explain the importance of dimensionality reduction in the data mining process. Secondly, we introduce the disabilities of existing support vector machine (SVM) classification algorithm and the K-means algorithm in the application of high-dimensional datasets, and propose a solid foundation for the improvement of the data mining algorithm.(2) We designed an effective two-stage data mining algorithm of multi-classification in high-dimensional dataset. Firstly, high dimensional datasets is roughly clustered using bisecting K-means algorithm, and then the results is separated using the SVM decision tree algorithm, in order to achieve the purpose of multi-classification of high dimensional data. This method not only effectively reduce the training time during the classification in high-dimensional data, and the experimental results show that classification accuracy of this method is more exacter than K-means clustering algorithm or SVM algorithm and also reduces the complexity time.(3) Analyze the process of existing knowledge discovery, and achieve the inner loop of data preprocessing and data mining to get the best results of mining. In this thesis uses the support vector machine decision tree algorithm (SVMDT) to implement the data dimensionality reduction of data mining in data preprocessing. To the problem of multi-classify in high dimensional data, the thesis proposes a method of high-dimensional data classification based on the bisecting K-means clustering algorithm and the SVMDT. Firstly, the original data set is transformed from high-dimensional data space to low-dimensional data space by the PCA. Secondly, we use bisecting K-means clustering algorithm in the low-dimensional space to get the class information of sample. And then, use the matrix H between the low and high dimension to generate high-dimensional data class information that can guide SVMDT algorithm to classify in the dataset. A low-dimensional dataset and new matrix H were acquired. Then we can cluster in the new lower-dimensional space by the bisecting K-means clustering algorithm. This method is adaptively executed to generate the convergence results. This method can not only avoid the dimension disaster, but also can receive convergence results adaptively. Finally, the comparative experiment which compared with NLSVM algorithm and SVM decision tree algorithm shows the effectiveness of the BKM-SVMDT method.
Keywords/Search Tags:the bisecting K-means clustering, SVM decision tree algorithm, theclassification of high-dimensional data, adaptive methods
PDF Full Text Request
Related items