Font Size: a A A

Study On Ensemble One-class Classification And Its Applications

Posted on:2016-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:1108330488457665Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
One-class classification is a special kind of machine learning algorithm. The one-class classification algorithm builds a descriptive model of the training dataset, which contains samples of only the positive class. This descriptive model distinguishes the positive class from unknown samples of any other classes. Many progresses of one-class classification have been made during the past two decades, and one-class classifiers have been applied to various problems such as anomaly detection, target recognition, image classification, etc.. Meanwhile, ensemble learning schemes such as random forests and Boosting are the most important meta algorithms in machine learning. Therefore, the combination of one-class classification and ensemble learning is an important research problem. This dissertation studies various types of ensemble one-class classification algorithms and their applications in malware analysis. The main contributions are outlined as follows.1. In the research of modular ensembles of one-class classifiers, a novel Density Based Modular Ensemble One-class Classifier(DBM-EOC) algorithm is proposed. DBM-EOC first performs density analysis on the training dataset to obtain a minimal spanning tree by using local density characteristics of the positive class. On this basis, training samples are categorized into several groups and simple one-class classifiers are trained for them. Finally all the base classifiers are modularly aggregated to construct the final DBM-EOC model. DBM-EOC performs better than conventional one-class classifiers especially on training datasets which contain multiple clusters, multi-density distributions and the noise.2. In the designing of clustering based ensemble one-class classification, the problems of determining the number of clusters and controlling the high computational complexity are analyzed. Then an Ensemble Clustering based Stable SVDD(ECS-SVDD) algorithm is proposed. ECS-SVDD introduces clustering stability analysis. The identifying of the number of clusters and their distributions are unified in one enhancing framework. Then multiple one-class classifiers are constructed to describe clusters of the target class. Lastly these one-class classifiers are fused following the maximum fusion volume method. Experimental results show that ECS-SVDD outperforms single SVDD and some other related one-class classification algorithms. Considering that clustering based ensemble one-class classifiers often have high computational complexities, a Fast Structural Ensemble One-Class Classifier, FS-EOCC, is proposed. FS-EOCC calculates the number of clusters as an approximation based on the size of the training dataset and the computational complexity of the base classifier. A two-round clustering method is used in FS-EOCC to avoid the negative influence of the inaccurate approximated number of clusters. FS-EOCC does not need repeated running of the clustering algorithm, therefore FS-EOCC and common non-ensemble one-class classifiers are of the same order of computational complexity.3. In the research of classic ensemble one-class classification, a novel sequential ensemble one-class classification algorithm named as OCCBoost is proposed. The difficulty of sequential ensemble one-class classifier is the lack of theoretical framework and the difficulty of evaluating base models at the training time. OCCBoost starts from the Bayesian decision theory and indicates that the one-class classification problem can be treated as ranking of estimated posterior probabilities of samples. So the idea of boosted ranking could be introduced to design a Boosting one-class classification algorithm. Second, a novel artificial outlier generation method and a corresponding weak one-class classifier is developed to train and evaluate base one-class classification models. OCCBoost requires more iterations but the overall computational complexity is low and its performance is competitive with conventional one-class classifiers.4. The post processing of an ensemble learning model can enhance its speed and performance. The ensemble pruning algorithms is introduced to ensemble one-class classification, thus a novel ensemble one-class classifier named Pruned Hybrid Diverse Ensemble One-class Classifier(PHD-EOC) is proposed. Though one-class classification performance can be improved by ensemble, it can also degrade if the set of base classifiers are not carefully selected. On this basis, this study further analyzes that lack of diversity heavily accounts for performance degradation. Accordingly, a hybrid method for generating diverse base classifiers is proposed. Secondly, in the combining phase, to find the most useful diversity, the one-class ensemble loss is split and analyzed theoretically to propose a diversity based pruning method. The experimental results show that the PHD-EOC strikes a better balance between the diverse base classifiers and classification performance.5. The value of machine learning algorithms is their abilities to solve practical problems.Therefore, A malware detection Framework based on ENsemble One-Class classification named FENOC is proposed and evaluated. Static analysis and dynamic analysis are both used in FENOC to collect complete behavioral data. Then FENOC uses hybrid features at different semantic layers to ensure a comprehensive insight of the program to be analyzed, including a novel Bilayer Behavior Abstraction(BLBA) method. The malware detector is the ensemble of a novel learning algorithm called Cos TOC(Cost-sensitive Twin One-class Classifier), which uses a pair of one-class classifiers to describe malware and benign programs respectively. Random subspace method and clustering based ensemble method are developed to enhance the generalization ability of Cos TOC. Experimental results show that FENOC gives a comparative detection rate and a lower false positive rate than commonly used binary classification algorithms, especially when the detector is trained with imbalanced data, or evaluated in terms of false positive rate.
Keywords/Search Tags:Machine Learning, One-class Classification, Ensemble Learning, Clustering Analysis, Selective Ensemble, Malware Detection
PDF Full Text Request
Related items