Font Size: a A A

Research And Application Of Feature Extraction And Ensemble Learning Algorithms

Posted on:2016-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y HouFull Text:PDF
GTID:1228330470958142Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Feature extraction and ensemble learning are always research focus of the current studies in machine learning fields, which have a very succesfully application. However, the feature extraction and ensemble technologies have not yet matured, there are still many unresolved issues in the course of the study and the applicationals of alls far short of the level of people’s expectations.This thesis carried out in-depth study of feature extraction and ensemble learning. In this process, on the one hand, analysis of the strengths and weaknesses of existing methods, on the other hand, studied the effects of various factors that affect the performance of feature extraction and ensemble algorithms. This part is divided into paving the way to further improve the feature extraction and ensemble learning, new feature extraction and ensemble algorithms are proposed.The main contents and innovation of this thesis is as follows:1) Kernel Principal Component Analysis (KPCA) and Multilayer Perceptron Neural Network (MLP) are popular feature extraction algorithms. However, these algorithms are inefficient and easy to fall into local optimal solution. Proposed a novel feature extraction algorithm—margin maximizing hyperplanes based Enhanced Feature Extraction algorithm (EFE), which can overcome the problem of KPCA and MLP algorithm. The proposed EFE algorithm, which maps the input samples to the subspace spanned by the normals of hyperplanes through adopting the pairwise orthogonal margin maximizing hyperplanes, is independent of the probability distribution of the input samples. The results of these feature extraction experiments on real world data setwine and AR show that EFE algorithm is beyond KPCA and MLP in terms of the efficiency of the implementation and accuracy of recognition. Finally, the results of these experiments are explained.2) Many of feature extraction techniques rely on the evaluation of local properties of the data, studied the popular feature extraction technique, proposed a novel feature extraction algorithm—Robust feature extraction algorithm (RFE) according to the weaknesses of these algorithms. RFE is divided into two stages, which is to minimizing the within-class distance and maximizing the between-class distance simultaneously. Experiments show that the performance indicators of RFE, which are classification accuracy and efficiency items on the feature extraction of real-world data sets, can reach optimal.3) Traditional ensemble learning algorithms exist the defect,which can not be classified based on the characteristics of the data. On this basis, firstly margin distribution is used to describe the characteristics of the data, then introduced it to the standard support vector machines,update the kernel SVM according to the data distribution characteristics, conformal adjustment the original kernel functions, increase Riemann metric near the border of the classification and enlarge the margin between the different classes to achieve improved standard SVM algorithm. This improved SVM algorithm is used as the base learners to construct ensemble algorithm, proposed the supervised ensemble algorithm based on the characteristics of data—Improved SVM’s ensemble (ISVM ensemble).The proposed ensemble algorithm can not only enhance the generalization performance the standard SVM algorithm,but also overcome the sensitivity of it.Finally, the superiority of the ISVM ensemble algorithm is proved in the experiment.4) The popular clustering ensemble learning algorithms have defective characters, which cannot give the appropriate treatment program in the light of the different characteristics of the different data sets, a novel clustering ensemble algorithm—Enhanced Clustering Ensemble algorithm based on Characteristics of Data (ECECD) is proposed for overcoming this defect. The ECECD algorithm constituted by generation of based clustering, selection of based clustering and consensus function. The ECECD algorithm can select a special range of ensemble members to form the final ensemble and produce the final clustering based on the characteristic of the data. The clustering errors gained by the proposed algorithm are always the minimum compared with the other clustering errors gained by the other algorithms.The NMI values of the roposed algorithm are always higher than the NMI values of these algorithms when increase candidate based clustering. Therefore, compared with these popular clustering ensemble learning algorithms, the proposed algorithm,which is a promising adaptive clustering ensemble learning algorithm for clustering various data sets with different characteristics,has the highest clustering precision and the strongest scalability.5) Researched on the intrusion detection application of feature extraction and ensemble learning algorithm,studied the intrusion detection data set—KDDCUP99. Properhandling of data sets,and then successfully apply the proposed feature extraction and ensemble learning algorithms on the intrusion detection and proposed a novel intrusion detection model—Ensemble Intrusion Detection System(Ensemble IDS) after appropriately processing the KDDCUP99data set. Finally,compared the detection results of the classical ensemble learning algorithms and the proposed ensemble learning algorithm on the KDDCUP99intrusion detection dataset.
Keywords/Search Tags:Feature Extraction, Maximum Margin, Ensemble LearningAlgorithm, Classification, Intrusion Detection
PDF Full Text Request
Related items