Font Size: a A A

Research On Malicious URL Detection Based On Machine Learning

Posted on:2022-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:P SuFull Text:PDF
GTID:2518306764479384Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,electronic computers have greatly facilitated our lives,but at the same time,they have brought many new web security risks.In the field of Web security,URLs in http messages have been an important carrier for hackers to implement network attacks.Hackers can use URLs to implement website phishing,cross-site scripting attacks,SQL injection and other attacks.These URLs that attempt to attack other computers are known as malicious URLs,which cause serious damage to individuals,societies,and even countries.Therefore,it is necessary to effectively detect URLs,but blacklists and other simple traditional detection algorithms are no longer effective in dealing with the ever-changing attack methods.The application of machine learning algorithms to URL detection has become a possibility due to the drive of big data and the improvement of hardware computing performance.The single model is often used in existing machine learning detection algorithms,which can be easily bypassed or even defeated by attackers.This paper focuses on the studies and its applications of key technologies of malicious URL detection,while drawing on and incorporating relevant research results in the field of natural language processing.The main work and contributions of this thesis can be summarized as follows:1.In this thesis,a large number of datasets from real web servers and open source communities are collected and are performed data processing operations such as cleaning and data balancing on the raw data.2.On the basis of empirical features and TF-IDF statistical feature extraction,three URL detection models based on traditional machine learning are studied and implemented,which are SVM,decision tree and random forest.3.The URLs are segmented at word level according to special characters,and vector quantization on URL is implemented by mapping words into vector space through Word2 vec model.Subsequently,LSTM-attention and Text CNN deep learning models are respectively used to complete the detection task of URLs.4.To improve the poor performance of the original LSTM network model in dealing with longer URLs,an AFSLSTM network-based URL detection model is proposed,which can autonomously extract the most relevant features for the classification task.It is demonstrated that the model has a high detection accuracy after training and testing.5.A new URL feature representation method is proposed to obtain a URL feature vector reflecting global features by averaging pooling of the original URL vector matrix.And the feature vector is subjected to higher dimensional feature extraction work using a one-dimensional CNN network for this feature vector,and finally a fully connected network is used for the classification task.6.Based on the AFSLSTM network and the global feature CNN network,a fused deep learning judgment model with malicious tendencies is proposed and implemented,and it is proved that the model can detect malicious URLs more effectively.
Keywords/Search Tags:Web attacks, malicious URLs, feature extraction, machine learning, model integration
PDF Full Text Request
Related items