| The spread of the new crown pneumonia epidemic has made people more dependent on Internet online services.With the continuous enhancement of computer performance and network bandwidth,many client-based Internet online services have begun to transform into web application services.Uniform Resource Locator(URL),as the entrance of web application services,not only facilitates people’s lives,but also brings network risks.Therefore,under the background that Internet online services mainly use URLs as the entrance,how to accurately and efficiently identify malicious URLs and reduce network security threats faced by users is an urgent problem to be solved in the field of network security.Traditional malicious URL identification methods,such as blacklist filtering method,rule matching method and host behavior interaction analysis method,have problems such as high maintenance cost,low identification accuracy and high performance overhead.Therefore,based on machine learning theory,this paper conducts research on malicious URL identification method integrating multi-dimensional features,which can complete malicious URL mining with high recognition accuracy and efficiency.Based on the experience and experience in intelligence information analysis,a malicious URL identification system was developed,which was successfully deployed in the actual production environment and achieved good practical application results.The main work of the thesis includes:(1)By expounding the concept of Indicator of Compromise(IoC),the categories and threat forms of malicious URLs,it is shown that malicious URLs,as the most important type of malicious IoC,have extremely high engineering value;analyze the static and dynamic characteristics of malicious URLs Identification technology,describes the classical machine learning classification algorithms commonly used in malicious URL identification technology.(2)Construct engineering datasets,extract data from device security logs in real production environments,perform data cleaning,screening and filtering,perform data preprocessing according to the characteristics required to identify malicious URLs,and construct standardized datasets.(3)Aiming at the shortcomings of the current identification method lacking real production data and low engineering practicability,combined with machine learning theory and based on the real production environment,a malicious URL identification method based on multidimensional feature fusion is proposed,and relevant experiments are carried out.The feasibility and effect of this method are verified.(4)A malicious URL identification system is designed and implemented.The system automatically obtains URL-related feature data from security device logs in batches.The data is processed by the malicious URL identification method based on multi-dimensional feature fusion proposed above,and the corresponding The malicious URL is visualized in the form of a Web page.(5)After integrated debugging,deploy the system to the actual production environment.After several rounds of running-in tests,the current system can mine more than 1,000 malicious URLs from the logs of various types of security devices every day,effectively enriching the Enterprise Threat Intelligence Repository.The research results show that,compared with the traditional malicious URL identification methods,the method proposed in this paper can complete the identification of five types of malicious URLs with lower performance costs under the constraints of engineering practical scenarios with limited computing resources.The system designed and implemented in this paper runs well in the actual production environment.The system adopts a modular design framework,which can support the installation of other malicious information identification algorithms,and provide strong support for the subsequent comprehensive mining of multiple types of malicious information. |