Font Size: a A A

Webshell Detection Method And Implementation Based On Machine Learning Combination Algorithms

Posted on:2020-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:J T LiFull Text:PDF
GTID:2428330572491633Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Webshell is a command execution program in the form of web page files,also known as backdoor files,and it is one of important methods for hackers to invade websites.Because of the great harm of Webshell,Webshell detection has become an important research direction in the field of network security.At present,there are many research results on Webshell detection including the commonly used detection tools such as "safe dog","D shield" and "hippo".Most of these tools on signature-based detect methods and their detection accuracy and recall rate are not ideal.In machine learning detection methods,most of them are based on a single machine learning algorithm,which can only improve the accuracy and recall rate from the aspects of feature construction,algorithm improvement or parameter optimization.By analyzing the current status of Webshell detection,this paper proposes a Webshell detection method based on machine learning combination algorithm and implements it with Python programming language.Although machine learning algorithms differ in accuracy and recall rate,there is no a definite relationship between inclusion and inclusion.Combing two or more machine learning detection algorithms can improve the recall rate,and inevitably lead to the decline of accuracy and detection efficiency.Because emergency response and other work require high recall rate(zero tolerance for missed reports),we can sacrifice some accuracy and efficiency slightly to improve the recall rate.In this paper,we hope to find an optimal combination of algorithms by testing various combination of machine learning detection algorithms so as to achieve a higher recall rate and a certain accuracy.Firstly,a Webshell detection system based on machine learning is designed and implemented by using python programming language and SQLite database.The system includes machine learning training subsystem and machine learning detection subsystem.We design and implement string reading module,manual classification module,text feature extraction module,database module,machine learning training module,file scanning module,automatic classifier module and document location traceability module.Secondly,we focus on the analysis of the differences among the text feature extraction algorithm models and optimize the text feature extraction algorithm in order to improve the accuracy of machine learning and recall rate.Text feature extraction is the key technology of machine learning and the extracted feature is the key feature affecting directly the final judgment of machine learning.Finally,the accuracy and recall of K-nearest neighbor algorithm,Naive Bayesian algorithm,Decision Tree algorithm,Random Forest algorithm,Logic Regression algorithm,Support Vector Algorithm(SVM)and Deep Learning MLP algorithm in individual detection and combination detection are tested through experimental analysis.Through the comparative analysis of the results,it is found that the recall rate based on the combination of Naive Bayesian and Decision Tree algorithm reaches 99.918%(1 missed report per 1000 documents),and the accuracy rate is 88.6%.It basically meets the requirements of emergency response on recall rate and accuracy.
Keywords/Search Tags:Machine Learning, Webshell, Text Feature Extraction, Decision Tree, Naive Bayes
PDF Full Text Request
Related items