Font Size: a A A

Research On Support Vector Machine Based On Improved CLIQUE Algorithm

Posted on:2018-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:M Z XuFull Text:PDF
GTID:2348330542990932Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of computer technology,the data presented in the form of massive data,high dimensional data and nonlinear data.How to extract useful information from high-dimensional and massive data is an important issue in the field of data mining.The support vector machine(SVM)and clustering algorithm as representative algorithms in data mining has attracted more and more attention.CLIQUE algorithm is high speed,and good at dealing with high dimensional.But the two parameters in the algorithm is difficult to determine,and the accuracy is not high.The data handler is required to have a strong priori estimate for the data set.SVM is accurate and efficient in dealing with small-scale data set sample classification problems,but it is very inefficient when training large-scale data sets.In this thesis,we propose a new method to improve the training efficiency of SVM by training two smaller training sets instead of a large training set for the training of large-scale data sets.Firstly,the adaptive parameters determination algorithm and the grid density record table are proposed to improve the shortcomings of CLIQUE that the difficulty of setting parameters and the loss of high density mesh.Then the improved CLIQUE algorithm is used to preprocess the data set of the support vector machine to obtain a representative sample set which can represent the distribution of the whole sample set.Through training the representative sample set to get an approximate classification of hyperplane,and we use the three distance standard proposed in this thesis to collect the samples near the hyperplane in order to get the exact training set.Finally,the optimal hyperplane is obtained by training the exact training set.The experimental results show that the speed of the distance standard 1 is the fastest,but the accuracy is not guaranteed.The accuracy of the distance standard 3 is the highest,but the speed is the slowest.The distance standard 2 is higher than the distance standard 1 in terms of accuracy and better than the distance standard 3 in terms of time complexity.Therefore,The support vector machine training time optimization algorithm using the distance standard 2,is an effective method to accelerate the training of support vector machines for large data set algorithm,which can not only guarantee the accuracy of the results of the training,but alsoguarantee faster than the original algorithm of support vector machine.
Keywords/Search Tags:clustering algorithm, CLIQUE algorithm, support vector machine, Training time optimization
PDF Full Text Request
Related items