Research On Malicious URL Detection Based On Machine Learning

Posted on:2022-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:P Su

Full Text:PDF

GTID:2518306764479384

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology,electronic computers have greatly facilitated our lives,but at the same time,they have brought many new web security risks.In the field of Web security,URLs in http messages have been an important carrier for hackers to implement network attacks.Hackers can use URLs to implement website phishing,cross-site scripting attacks,SQL injection and other attacks.These URLs that attempt to attack other computers are known as malicious URLs,which cause serious damage to individuals,societies,and even countries.Therefore,it is necessary to effectively detect URLs,but blacklists and other simple traditional detection algorithms are no longer effective in dealing with the ever-changing attack methods.The application of machine learning algorithms to URL detection has become a possibility due to the drive of big data and the improvement of hardware computing performance.The single model is often used in existing machine learning detection algorithms,which can be easily bypassed or even defeated by attackers.This paper focuses on the studies and its applications of key technologies of malicious URL detection,while drawing on and incorporating relevant research results in the field of natural language processing.The main work and contributions of this thesis can be summarized as follows:1.In this thesis,a large number of datasets from real web servers and open source communities are collected and are performed data processing operations such as cleaning and data balancing on the raw data.2.On the basis of empirical features and TF-IDF statistical feature extraction,three URL detection models based on traditional machine learning are studied and implemented,which are SVM,decision tree and random forest.3.The URLs are segmented at word level according to special characters,and vector quantization on URL is implemented by mapping words into vector space through Word2 vec model.Subsequently,LSTM-attention and Text CNN deep learning models are respectively used to complete the detection task of URLs.4.To improve the poor performance of the original LSTM network model in dealing with longer URLs,an AFSLSTM network-based URL detection model is proposed,which can autonomously extract the most relevant features for the classification task.It is demonstrated that the model has a high detection accuracy after training and testing.5.A new URL feature representation method is proposed to obtain a URL feature vector reflecting global features by averaging pooling of the original URL vector matrix.And the feature vector is subjected to higher dimensional feature extraction work using a one-dimensional CNN network for this feature vector,and finally a fully connected network is used for the classification task.6.Based on the AFSLSTM network and the global feature CNN network,a fused deep learning judgment model with malicious tendencies is proposed and implemented,and it is proved that the model can detect malicious URLs more effectively.

Keywords/Search Tags:

Web attacks, malicious URLs, feature extraction, machine learning, model integration

PDF Full Text Request

Related items

1	Research And Implementation For Detecting Methods Of Malicious URLs Based On Machine Learning
2	Research On Malicious Web Page Recognition Based On Feature Fusion And Machine Learning
3	Learning to detect malicious URLs
4	Research On Malicious URL Detection Technology Based On Machine Learning
5	Malicious Urls Detection Using Deep Learning
6	Research On Malicious URL Recognition Based On Machine Learning And Its System Implementation
7	Research On User Malicious Comments Detection Based On Machine Learning
8	Detection Of Malicious URLs In Online Social Networks
9	Research On Malicious URLs Detection Based On Neural Network Model
10	Machine Learning Based Complex Surface Feature Extraction And Segmentation Method And Its Applications