With the continuous progress of information technology,the daily life of the public has become increasingly efficient and convenient.Almost everyone can use the internet for a range of social activities such as shopping,paying,socializing,and obtaining external news.Many governments and public institutions have also promoted government services on the network,achieving efficiency in government processing and transparency in public data.It can be seen that the development of the Internet has brought huge dividends to all sectors of society.However,the quality of the technology depends on the application scenario and the people who use it.The Internet not only provides convenience for people’s social life,but also provides an efficient criminal platform for attackers.In order to achieve covert purposes,network criminals seek various security vulnerabilities from the network and exploit them,causing huge losses to the public.Without going out,you can click the URL to implement the services required for offline scenarios.However,with the rapid growth of the number of web pages,the number of web based Internet attacks is also increasing,which seriously endangers the security of people’s information and property,making users face a huge threat to network security.As the most commonly used entry point for Internet users to enter cyberspace,the importance of identifying and detecting malicious URLs is self-evident.Traditional malicious URL identification methods mainly include blacklist based identification methods and rule matching based identification methods,but they more or less have problems such as low recognition precision and high cost.Based on machine learning algorithms,this article collects a large number of datasets from web servers and open source communities for research,achieving high recognition accuracy and efficiency.Based on this,a malicious URL recognition system has been developed and designed,which can be deployed in actual production environments.Malicious URL identification is an important issue in the current field of network security.Due to the increasing number of network threats,traditional rule based malicious URL detection methods have been difficult to meet practical needs.In recent years,malicious URL identification methods based on machine learning have been widely studied and applied.This paper proposes a malicious URL identification method based on machine learning.First,through data collection and preprocessing,a dataset containing a large number of malicious URLs and normal URLs was constructed.Then,a variety of features are selected and processed by feature engineering to obtain a high-dimensional feature vector.Then,various machine learning algorithms are used to train and select models,and an efficient malicious URL identification model is obtained.In the experiment,we used various machine learning algorithms including SVM,decision trees,and random forests for training and comparison.Experimental results show that the proposed malicious URL identification method has a high accuracy and recall rate,and can effectively identify malicious URLs and prevent network threats.The main work of the paper includes:1.Introduce malicious URL related technologies by expounding web theoretical knowledge,collect a large amount of data sets from web internet servers and open source communities,and clean and filter the original data.2.Describe machine learning related algorithms,and select algorithms suitable for judging malicious URLs for research.3.Based on empirical features and TF-IDF statistical feature extraction,three URL detection models based on traditional machine learning are studied and implemented,namely,support vector machines,decision trees,and random forests.By comparing two feature extraction methods and three machine learning algorithms through experiments,the best experimental model for URL detection is found.4.Develop a malicious URL detection system using the proposed detection model,and test and analyze the results. |