Font Size: a A A

System Sequence Sample Generation And Anomaly Detection

Posted on:2022-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LuFull Text:PDF
GTID:2518306539462904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In popular Internet applications such as Io T and cloud computing,well-applicated host-based intrusion detection technology can greatly improve the security level of key network infrastructures.In the field of host intrusion detection technology,dozens of research results have been accumulated in previous researches,such as N-Gram sliding window methods,feature extraction based methods,traditional machine learning algorithms,deep-learning based methods,and so on.Each of the above methods has its strengths,as well as limitations,such as high training data requirements,high computational cost and over-fit problem,which need to be further studied and optimized.In addition,the number and proportion of abnormal samples in intrusion detection data sets are generally very low,which may affect the effect evaluation of the detection method.Therefore it is necessary to consider to use the oversampling method to increase the size of abnormal samples.What's more,the SMOTE oversampling method often used by scholars is considered only suitable for continuous data,other than in the field of host sequences,so it is also important to carry out researches for oversampling techniques.Facing with the problem of host sequence anomaly detection and sample balance,this paper puts forward a variety of researches,on the basis of studying the current technology,and verifying its validity by experimenting on ADFA-LD dataset.The main research content of this paper is as follows:As beginning of the explorations,in the aspect of anomaly detection,two detection methods based on key command extraction are proposed.Firstly,the anomaly detection method based on Lasso key command extraction is proposed to solve the sparse distribution problem in host sequences,and the experiment shows that the Lasso regression can contribute to dimension reduction of the features of host sequence,achieving better accuracy under the KNNclassifier,with the desired feature dimension controlled in a lower range.Secondly,the method based on Text Rank key command extraction is proposed.According to the semantic characteristics of the host sequence,each string of host sequence is converted to a directed weighted graph,then the Text Rank weighting scores of all commands in the sequence are calculated,by which the key commands of each sequence are extracted.Experiment results show that this method can improve the efficiency of anomaly detector,compared with the traditional TF-IDF-based method.To further explore applications of embedding techniques in the field,this paper proposes a host sequence anomaly detection method based on sequence vector.Firstly,word embedding of all commands are constructed,then sequence embedding representations are modeled.In the experiment,the embedding vectors of both commands and sequences are visualized to explore features,and it is found out in anomaly detection test that the Seq2 Vec method can achieve more ideal detection results under a simple 1-NN classifier,especially the performance of false positive rate,which is greatly improved compared with the above methods.In addition,a comparison of detection results is also conducted for the two word embedding methods.In terms of sample balancing,in order to solve the problem of insufficient abnormal training samples,this paper introduces the model of deep converge generative adversarial network into the area of minority sample generation,and proposes a method to further optimize Adam,which is tested to increase the convergence frequency and produce more samples.In further experiments,we validate the method through a variety of anomaly detection techniques,compare it with other data balancing methods,and discuss the effect of generating sample detection in different sample balance degrees.The result shows that the abnormal samples generated by this method can be effectively identified and the detection results of various detection methods can be improved to varying degrees.
Keywords/Search Tags:abnormal detection, sample generation, host sequence embedding, Adam optimization
PDF Full Text Request
Related items