Font Size: a A A

Research On Classification Algorithm Of Data Mining Based On Improved Support Vector Machine

Posted on:2017-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2308330509453169Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
With the rapid development of information and computer technology,Accumulation of data has taken place at an explosive rate. There must be abundant latent knowledge and information in these data, how to extract information and use information effectively becomes the focus of researchers. Recently, continually developing technology named Data Mining just can be help people find latent knowledge and information from data.As an effective Data Mining classification method, Support Vector Machine(SVM) based on statistical learning theory and structural risk minimization can achieve classification of unknown samples by constructing the optimal classification hyper plane in attribute space. It has strong generalization ability and better nonlinear data processing ability, but it still has some shortcomings. SVM is deeply researched and analyzed, the main research results are as following:1. Aimed at the problems of slow training speed for big sample data-sets and sensitivity to noises in data mining classification with fuzzy support vector machine(FSVM), an improved FSVM-based algorithm for data mining classification is proposed. First, this algorithm uses preselected candidate support vectors to reduce the number of training samples to improve training speed. Second, a novel membership function is defined to enhance the function of support vectors in construction of FSVM. Finally, the neighborhood sample density is applied to the design of membership function to reduce the influence of noises or outliers on the classification to improve classification validity. Experimental results show that the proposed algorithm can improve training speed and classification accuracy. To overcome the disadvantage of testing speed of FSVM for big data set in data mining classification, an improved fuzzy support vector machine classification algorithm is presented. The algorithm preselects effective candidate set to reduce the number of training samples to improve the training speed. Particle swarm optimization algorithm is used to optimize support vector set, average classification error is used as the fitness function. Finally, compare the membership of samples with the threshold value and select large membership of samples as support vectors to achieve improved the testing speed. Experimental results show that the presented algorithm improves the training speed and testing speed of fuzzy support vector machine in premise of guaranteeing good classification accuracy.2. Ball vector machine(BVM) has faster training speed than standard support vector machine(SVM). But its classification performance is bad for imbalanced data set. To solve this problem, a classification algorithm for imbalanced data set based on improved BVM is proposed. The proposed algorithm decomposes an imbalanced training data set and the set is randomly divided the same number of positive kind of samples and the positive sample subsets, they compose balanced training sample set.Then rotation forest method is applied to preprocess new balanced set and train the base classifiers. Finally, integrated technology is applied to base classifiers to get classifiers of imbalanced dataset. By testing and analyzing UCI datasets, comparing with BVM、 ESt SVM、Ada Boost-SVM-OBMS and En SVM, the proposed algorithm can effectively improve the classification performance for imbalanced data set.3. There are a lot of high-dimensional and imbalance data in real life, but due to the impact of the sample distribution and dimensions, classification performance of traditional data mining classification algorithms is not high. To solve this problem, a high-dimensional and imbalanced data classification algorithm based on SVM is proposed. The algorithm first uses improved KSMOTE(Kernel Synthetic Minority Over-sampling Technique) to combine positive class samples. Then feature selection based on sparse representation is applied to select feature in the feature space. Finally,these pre-images of the synthetic instances are found based on a distance relation in input space and SVM is used to class. Experiments shows that proposed algorithm can improve the SVM classification performance of high-dimensional and imbalanced data set.
Keywords/Search Tags:data mining, classification, support vector machine, membership function, imbalanced dataset, rotation forest algorithm, kernel SMOTE
PDF Full Text Request
Related items