On Concept Drift, Deployability, and Adversarial Selection in Machine Learning-Based Malware Detection

Posted on:2013-11-17

Degree:Ph.D

Type:Dissertation

University:University of Louisiana at Lafayette

Candidate:Singh, Anshuman

Full Text:PDF

GTID:1458390008475217

Subject:Computer Science

Abstract/Summary:

Machine learning-based methods are used for malware detection due to their ability to automatically learn the detection rules from examples. The effective application of machine learning-based methods requires addressing some problems that arise due to adversarial nature of the malware domain. We address three such problems in this dissertation: concept drift, deployable classifier selection, and adversarial configuration of selection-based AV system.;Concept drift results from nonstationary populations. Malware populations may not be stationary due to evolution for evading detection. Machine learning methods for malware detection assume that malware population is stationary i.e. probability distribution of the observed characteristics (features) of malware populations do not change over time. We investigate this assumption for malware families as populations. We propose two measures for tracking concept drift in malware families when feature sets are very large-relative temporal similarity and metafeatures. Our study using the proposed measures on 4000+ samples from three real world families of x86 malware, spanning over 5 years, shows negligible drift in mnemonic 2-grams extracted from unpacked versions of the samples.;A novel classifier selection criterion, called deployability, is proposed. Deployability explicitly takes into account the performance target that the deployed classifier is expected to meet on unseen data. The performance target in conjunction with interval estimate of generalization performance of candidate classifiers can be used to select deployable classifiers. An evaluation of the criterion shows least expected cost classifier may not be deployable for a given cost target and higher expected cost classifiers may be deployable for a given cost target and confidence level. A game-theoretic model of dynamic classifier selection-based AV system is proposed. The model takes into accoint the possible evasion of the selector. A backward induction based equlibrium solution of the game between adversary and defender gives optimal configuration of the classifiers in the systemn for the expected cost of defender to be minimum.;The solutions to each of the three problems would help in effective application of machine learning-based methods to malware detection.

Keywords/Search Tags:

Malware, Machine learning-based, Concept drift, Adversarial, Selection, Deployability

Related items

1	A Comprehensive Study On Learning-based PE Malware Family Classification Methods
2	Research On Adversarial Attacks For Data Stream Concept Drift
3	Research On Malware Detection And Adversarial Sample Generation Based On Machine Learnin
4	Research On Malware Adversarial Attacks And Defense Strategies Against Adversarial Examples
5	Research On Adversarial Sample Generation Method For PE Malware Detection
6	Research And Application Of Inhibiting The Effects Of Concept Drift Based On Machine Learning
7	Research On Competence Model-Based Adaptive Learning Techniques For Handling Concept Drift
8	Android Malware Detection Method Based On Feature Selection
9	Research Of Machine Learning Based Malware Detection And Adversarial Attack Methods
10	Research On Online Learning For Concept Drift