Font Size: a A A

Research On Outlier Detection Based On Selective Ensemble Learning

Posted on:2017-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2308330503959952Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Outlier detection is a very important research direction in data mining. Outliers are the data that are obviously abnormal with other objects, being not compatible with the existing models or deviating from the model. Outlier detection is aiming to find potential, meaningful and valuable knowledge. And it is widely used in many fields such as fraud detection, intrusion detection, medical image recognition, astronomical observation and detection of agricultural pests and diseases. However, the existing outlier detection methods still face a lot of problems. On the one hand, due to some problems such as data sparseness and high dimension, it is difficult for the existing methods to detect outliers effectively from high dimensional data. On the other hand,the generalization performance of the existing methods is still relatively poor, which detection performance and detection efficiency of the new data are not high.Ensemble learning combines several different single-models into a complex model, which can effectively improve the generalization performance of ensemble model by using the diversity among these single models. Selective ensemble learning only chooses a part of base learners that both have high accuracy and large diversity to contruct ensemble learner, which reduces the computation and storage cost of the learning system, meanwhile is expected to achieve a better generalization performance.Aiming at the problems existing in the current outlier detection methods, this paper focuses on the research of outlier detection based on selective ensemble learning,which mainly focuses on the problem of outlier detection from high dimensional data,and the problem of improving the generalization performance of outlier detection algorithm. Firstly, aiming at the problem of outlier detection in high dimensional data,the concept of approximate reduct is proposed, and an ensemble learning algorithmbased on approximate reduct is proposed. In addition, a selective ensemble learning algorithm based on approximate reduct and optimal sampling is proposed. Secondly, in order to improve the generalization performance of outlier detection algorithm, a selective ensemble learning algorithm based on multi-modal perturbation is proposed.In addition, a selective ensemble learning algorithm based on randomized greedy selection is proposed. In this paper, the proposed algorithms are applied to outlier detection, and the experimental results demonstrate that compared to the traditional algorithms, the proposed algorithms have better outlier detection performance.The main work of this paper includes the following aspects:First of all, the concept of reduct in rough sets is extended, and the concept of approximate reduct is proposed, thus an emsemble learning algorithm(called ELAR)based on approximate reducts is proposed. ELAR algorithm can effectively reduce the dimension of high dimensional data, and it can get better outlier detection performance.Experimental results show that ELAR algorithm is better than the existing algorithms in detection accuracy and time complexity.Secondly, based on approximate reduct proposed above, and futher a selective ensemble learning algorithm(called SE_AROS) based on approximate reduct and optimal sampling is proposed. Experimental results show that SE_AROS has better performance than traditional algorithms.Then, aiming at the existing problems of single-modal perturbation in ensemble learning, a selective ensemble learning algorithm(called ELSR) based on multi-modal perturbation is proposed. ELSR uses sampling techniques and attribute reduction methods in rough sets for multi-modal perturbation, which can obtain a set of diverse base learners. In multiple data sets of experimental results show that compared to other algorithms, ELSR has better outlier detection performance.Finally, in view of the problem that the selective ensemble learning algorithm based on greedy method is easy to fall into local optimum, a selective ensemble algorithm(called NSERGS) based on randomized greedy selection is proposed.NSERGS extends the search space of greedy search by introducing a randomized strategy, which not only reduces the probability of the algorithm to be trapped in local optimum, but also improves the performance of the algorithm.
Keywords/Search Tags:outlier, ensemble learning, rough sets, selective emsemble, reduction, algorithm
PDF Full Text Request
Related items