Font Size: a A A

Robust Machine Learning In Adversarial Environment And Its Applications

Posted on:2016-10-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z M HeFull Text:PDF
GTID:1108330479993457Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine learning has been widely used in security applications, for example, intrusion detection, malware detection, spam filtering, and steganalysis. Traditional machine learning assumes that training and test data follows the same distribution. However, this assumption may be violated in security applications since there usually exists an attacker who can manipulate the training or test data to mislead the decision of the classification. Current researches have showed that a slight change on data can significantly decrease the performance of machine learning systems, which threatens the security of the applications. Traditional machine learning is not adequate to deal with adversarial attacks.This study aims to analyze three kinds of adversarial attacks, including causative attack,exploratory attack and privacy violation, and develop robust systems against these attacks. Two real security applications including steganalysis and web browsing are also discussed. The major contributions of this thesis are as follows:1) The generalization ability of a classifier is usually sacrificed in order to increase the robustness to causative attacks in current countermeasures. Therefore, these countermeasures should be applied only if the training set is attacked by an adversary. However, the detection of causative attack on a dataset has not been investigated yet. As the geometrical natures of a dataset are changed by a causative attack, the data complexity measures, which describe geometrical characteristics of data, is applied to causative attack detection. The causative attack detections are formulated as 2-class and multi-class classification problems which detect whether an attack and which type of attack are contained in a dataset respectively.Experimental results show that the untainted datasets and the datasets with different kinds of attacks can be separated by data complexity measures clearly.2) Previous studies show that although a one-class classifier is robust to exploratory attacks,its generalization ability is relatively low. By contrast, a two-class classifier has good discriminability but it is vulnerable to exploratory attacks. A hybrid method, named as 1.5C classifier, is proposed to defend against exploratory attacks. The proposed model combines one-class and two-class classifiers to learn a decision function which more tightly encloses the legitimate samples in feature space, without significantly compromising accuracy in the absence of attack. The proposed method can be used to improve the security of any classifier at test time, as shown by the experiments on spam and malware detections.3) The performances of current steganalysis methods drop significantly when the quantization tables of training and test images are different. Unfortunately, using all possible quantization tables in a system is not realistic and practical. We formulate the changes of steganalysis features caused by the difference of quantization tables as feature perturbations, and define a stochastic sensitivity by the expected square of output changes of the classifier with respect to these feature perturbations to compute the robustness of the classifier. The steganalysis system minimizing both training error and stochastic sensitivity is proposed to improve the robustness against the difference of quantization tables. The results indicate that the proposed method is robust to the difference of quantization tables in the training and test images.4) Most of recent researches focus on leak quantifications in web applications which may be impractical in web browsing due to their time complexity and specific assumptions. The information leaks of web browsing are firstly analyzed from the viewpoint of machine learning and quantified by data complexity measures. The performances of these data complexity measures in representing information leaks are evaluated and compared with the existing approaches experimentally. Moreover, the parameter selection model based on the leak quantification is proposed to estimate suitable parameters for website fingerprinting countermeasures. The experimental results confirm that the countermeasures with parameters selected according to the data complexity measures are securer than other leak quantification measures.
Keywords/Search Tags:Adversarial learning, Robust system, Causative attack, Exploratory attack, Privacy violation, Website fingerprinting countermeasure, Steganalysis
PDF Full Text Request
Related items