Font Size: a A A

Application Of Random Forest In Cloud Computing Anomaly Detection

Posted on:2021-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2518306308966789Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of cloud computing,intrusion events frequently occur in the cloud environment,and traditional network security measures are difficult to adapt to the security detection requirements of the cloud environment.In order to solve the problem of network security in cloud environment,researchers proposed cloud environment anomaly detection technologies based on machine learning.In machine learning algorithms,BP neural network,k-means,support vector machine and other classification algorithms are widely used in anomaly detection,but these algorithms have high complexity,weak model generalization ability,and the detection time is longer when the data volume is large.Random forest is a typical application of Bagging idea of ensemble learning algorithm.Each weak learner gives a classification result,and then multiple weak learners are combined into a strong learner in parallel.The final classification result of input samples is determined by voting.In this paper,through the study of anomaly detection and random forest,a system using random forest for anomaly detection was designed.We collected audit log of windows as the dataset of our anomaly detection model,and evaluated the abnormal detection results using accuracy,precision,recall and f-1 measurement methods.The major achievements and innovations achieved by this paper include:(1)For data preprocessing:Paper fully mining data meaning firstly,then using characteristics importance metrics,we found the crucial fields,which play a key role in the classification results,through the multidimensional time series.By encoding these fields,we transformed multidimensional series into a characteristic vector which composed of 0 and 1.The characteristic vector is the input dataset of the random forest anomaly detection model.The complexity of model is reduced greatly,and the results show that it takes about 5s to train the dataset which composed of 400000 samples,met the requirement of real-time detection.(2)In aspect of random forest model:We designed a cloud environment random forest anomaly detection system,used RF to training our dataset,then adjusted the number of decision tree in the forest,the number of features which involved in decision tree node splitting,the maximum depth of decision tree and other important parameters,finally we found the best parameters value,and achieved a high classification accuracy on the test dataset.(3)As a comparison,using same dataset,we used random forest,AdaBoost,SVM and k-means algorithms respectively in our anomaly detection system.The results were compared with the random forest algorithm and we found that in terms of training and detection time(there are 400,000 samples in dataset which used in our paper),the random forest algorithm takes about 5s,the AdaBoost algorithm takes about 50s,and the SVM takes about 3 hours;in terms of classification result accuracy,random forest is also higher than AdaBoost,k-means and SVM algorithm.
Keywords/Search Tags:cloud computing, machine learning, anomaly detection, random forest, xgboost
PDF Full Text Request
Related items