Font Size: a A A

Research On Technology Of URL Security Detection Based On Machine Learning

Posted on:2020-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2428330590473232Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous advancement of Internet technology,more and more examples of network applications have occurred around us.While Internet technology has helped people's lives become better,it has also brought new dangers to people.The damage caused by malicious URLs has caused people to gradually realize that it is extremely important to take action against malicious URL attacks and reduce the possibility of malicious URL attacks causing harm to people.Blacklist filtering detection technology has emerged as the times require,but with the gradual development of technology,the data set has gradually become huge,and simple blacklist detection technology is far from meeting the current demand for malicious URL detection technology.Machine learning algorithms are gradually applied to the detection of malicious URLs,but the accuracy of the models built by different researchers is different,and researchers generally choose to use a single machine learning algorithm to build the detection model,which will inevitably lead to the detection model exhibits poor performance under certain conditions.This paper mainly studies the malicious URL detection technology based on machine learning,and constructs a detection model that multi-classifiers work together.Finally,a malicious URL detection system for processing real-time data streams is designed and implemented using the built multi-classifiers model.The main work done in this paper includes: The required positive and negative data sets are collected in multiple channels,and the collected data are subjected to preprocessing operations such as data balancing,suspected malicious word replacement data cleaning.In this study,combined with the existing research results of malicious URL detection feature extraction,a new feature extraction scheme is built by adding custom feature items,and the TF-IDF feature extraction scheme and word2 vec feature extraction scheme are carried out.The multi-classifiers detection model is constructed.The three classifiers in the detection model are the logistic regression model based on comprehensive feature extraction,the SVM model based on TF-IDF feature extraction,and the CNN network model based on word2 vec feature extraction.By assigning different weights to the three models,experimenting and adjusting the threshold of malicious URL determination,the comprehensive performance of the multi-classifiers interaction detection model is improved.A malicious URL detection system was constructed using the proposed multi-classifiers detection model.A system for malicious URL detection of real-time data streams was designed and implemented,and tested and analyzed.The test results show that the malicious URL detection system constructed by the proposed multi-classifiers detection model scheme shows better on the test set consisting of Alexa's top 200,000 data and 40,000 malicious URL data.Classification performance shows good overall performance in terms of correct rate,recall rate,accuracy,false positive rate and F1 value.
Keywords/Search Tags:URL, Malicious detection, Logistic regression, SVM, CNN network, Real-time computing
PDF Full Text Request
Related items