Font Size: a A A

Research And Implementation Of Network Intrusion Assistant Forensics System Based On Ensemble Learning

Posted on:2022-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:2518306506996379Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Network intrusion brings a great threat to personal privacy and property safety,and even destabilize national security and stability.Network intrusion forensics is based on the analysis of network traffic data to reproduce intrusion behavior,and find intrusion traces as evidence in court.However,the feature redundancy and category imbalance of high-dimensional massive data have caused a huge challenge to the traditional forensics technology.Therefore,using data mining technology to network intrusion forensics has certain theoretical research value and positive reality significance.The thesis is based on data cleaning technology and feature selection methods,modeled for the experimental data,and selected the optimal model to design and implement a network intrusion aided forensics system based on ensemble learning.Main tasks as follows:1.After cleaned the original data,respectively used SMOTE and Easy Ensemble sampling method to solve category unbalanced problem,and directly established a logistic regression model.By comparing the ROC and AUC of the model on the test dataset,finally,chose the Easy Ensemble sampling method and effectively weakened the negative effect caused by the category imbalance.2.Take feature ranking and feature search methods for feature selection,through the Pearson correlation coefficient method filtered out the redundant features and used the Random Forest algorithm measured feature subsets importance.Further combining with the greedy algorithm coded sequence backward selection algorithm(Sequential Backward Selection Algorithms,SBS),so that the feature subset is effectively selected,and reduced the data dimension greatly.3.Ensemble learning algorithms such as Random Forest,XGBoost and Light GBM are used to model and predicted the experimental data,through the model evaluation selected the optimal modeland integrated XGBoost and Light GBM based on soft weighted voting strategy again,improved the model performance further.At the same time,a model?based on L1 regularized feature selection is established,and the effectiveness of the feature selection in thesis is proved through comparative research.In addition,the cost sensitive Ada Cost model is set up as an alternative scheme,which realized that the Precision,Recall and F-Measure are adjusted with the change of the cost factor,used to meet the preference demand for Precision and Recall in different occasions.4.In order to solve the problem that XGBoost model parameters are too many to adjust and have high risk of over-fitting,the Genetic Algorithm is introduced to optimize the model parameters.Compared with the model with default parameters and the model adjusted by random search,the performance of GA-XGBoost model is effectively improved.5.With the most effectively network intrusion forensics model,I designed and implemented a network intrusion forensics system based on the Django framework,which can be used as an exploratory improvement for the existing professional forensics system.This system not only provides basic functions such as user management and data management,but also introduced a risk assessment model and combined the prediction results of the sample with an ensemble learning model to conduct risk assessment and grade division of the sample,and give advice on auxiliary decision-making processing.Realized the rapid filtering of massive data by forensics and precise positioning of suspicious samples,shorten the forensics cycle and strenghten the reliability of forensic decisions.
Keywords/Search Tags:Network intrusion forensics, Feature selection, SBS, EasyEnsemble, Ensemble learning
PDF Full Text Request
Related items