Research On Outlier Detection Based On Selective Ensemble Learning

Posted on:2017-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Zhang

Full Text:PDF

GTID:2308330503959952

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Outlier detection is a very important research direction in data mining. Outliers are the data that are obviously abnormal with other objects, being not compatible with the existing models or deviating from the model. Outlier detection is aiming to find potential, meaningful and valuable knowledge. And it is widely used in many fields such as fraud detection, intrusion detection, medical image recognition, astronomical observation and detection of agricultural pests and diseases. However, the existing outlier detection methods still face a lot of problems. On the one hand, due to some problems such as data sparseness and high dimension, it is difficult for the existing methods to detect outliers effectively from high dimensional data. On the other hand,the generalization performance of the existing methods is still relatively poor, which detection performance and detection efficiency of the new data are not high.Ensemble learning combines several different single-models into a complex model, which can effectively improve the generalization performance of ensemble model by using the diversity among these single models. Selective ensemble learning only chooses a part of base learners that both have high accuracy and large diversity to contruct ensemble learner, which reduces the computation and storage cost of the learning system, meanwhile is expected to achieve a better generalization performance.Aiming at the problems existing in the current outlier detection methods, this paper focuses on the research of outlier detection based on selective ensemble learning,which mainly focuses on the problem of outlier detection from high dimensional data,and the problem of improving the generalization performance of outlier detection algorithm. Firstly, aiming at the problem of outlier detection in high dimensional data,the concept of approximate reduct is proposed, and an ensemble learning algorithmbased on approximate reduct is proposed. In addition, a selective ensemble learning algorithm based on approximate reduct and optimal sampling is proposed. Secondly, in order to improve the generalization performance of outlier detection algorithm, a selective ensemble learning algorithm based on multi-modal perturbation is proposed.In addition, a selective ensemble learning algorithm based on randomized greedy selection is proposed. In this paper, the proposed algorithms are applied to outlier detection, and the experimental results demonstrate that compared to the traditional algorithms, the proposed algorithms have better outlier detection performance.The main work of this paper includes the following aspects:First of all, the concept of reduct in rough sets is extended, and the concept of approximate reduct is proposed, thus an emsemble learning algorithm(called ELAR)based on approximate reducts is proposed. ELAR algorithm can effectively reduce the dimension of high dimensional data, and it can get better outlier detection performance.Experimental results show that ELAR algorithm is better than the existing algorithms in detection accuracy and time complexity.Secondly, based on approximate reduct proposed above, and futher a selective ensemble learning algorithm(called SE_AROS) based on approximate reduct and optimal sampling is proposed. Experimental results show that SE_AROS has better performance than traditional algorithms.Then, aiming at the existing problems of single-modal perturbation in ensemble learning, a selective ensemble learning algorithm(called ELSR) based on multi-modal perturbation is proposed. ELSR uses sampling techniques and attribute reduction methods in rough sets for multi-modal perturbation, which can obtain a set of diverse base learners. In multiple data sets of experimental results show that compared to other algorithms, ELSR has better outlier detection performance.Finally, in view of the problem that the selective ensemble learning algorithm based on greedy method is easy to fall into local optimum, a selective ensemble algorithm(called NSERGS) based on randomized greedy selection is proposed.NSERGS extends the search space of greedy search by introducing a randomized strategy, which not only reduces the probability of the algorithm to be trapped in local optimum, but also improves the performance of the algorithm.

Keywords/Search Tags:

outlier, ensemble learning, rough sets, selective emsemble, reduction, algorithm

PDF Full Text Request

Related items

1	Research On Multi-label Selective Ensemble Based On Variable Precision Neighborhood Rough Set
2	The Research On Reduction Algorithm Of Rough Sets Theory
3	Research On Some Issues Of Rough Sets Theory And Its Applications
4	The Research On Application Of Rough Sets In Ensemble Learning
5	Reasearch Of Selective Ensemble Learning And Its Appliacation
6	Study And Application Of Attribute Reduction Algorithms Based On Rough Sets
7	Matroidal And Topological Approaches To Rough Sets
8	Research On Incremental Reduction Algorithm Based On Rough Sets
9	Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning
10	Research Of Multiple Classifiers Ensemble Learning Method Based On Rough Set