| While the Internet brings users rich resources and convenient services,it has also become a platform for network attacks due to its openness and anonymity.In many network security problems,malicious web pages play an important role and pose many threats to the security of network users.How to effectively detect malicious web pages has become a key research topic in the field of network security.This paper analyzes the traditional malicious web page recognition methods.In order to improve the detection effect,a hybrid feature selection method based on improved Relief feature filtering and PCA dimension reduction is proposed,and a training method of multi-layer integrated learning model based on undersampling is proposed.Based on the above methods,an online malicious web page detection system based on machine learning is designed and implemented.The main research contents of this paper are as follows:(1)The static feature selection method of malicious web page detection is studied.In the process of machine learning algorithm detecting malicious web pages,if we only pursue multiple features,it will lead to dimension disaster and feature redundancy.Considering the importance of features,the impact of unbalanced data sets,and the correlation between features,this paper proposes a hybrid feature selection algorithm to effectively select malicious web features based on Relief feature selection algorithm,which is improved by under sampling and combined with PCA dimension reduction algorithm,and solves the problem that Relief feature selection algorithm performs poorly on unbalanced data sets and does not consider the correlation between features.(2)The method of building malicious web page detection model under unbalanced data set is studied.In order to solve the problem that the detection results of malicious web pages are biased towards the majority due to the serious imbalance between the proportion of malicious web pages and benign web pages,a malicious web page detection model based on the combination of undersampling,cost sensitive learning and multi-layer integrated learning was proposed.Local data balance is achieved by undersampling;Integrated learning based on cost sensitive learning ensures the integrity of global information.The experimental results show that the performance of the multi-layer integrated learning model based on undersampling is better than that of the traditional machine learning model in identifying malicious web pages on unbalanced data sets.(3)In view of the above research content,an online malicious page detection website based on hybrid feature selection algorithm and undersampling multi-layer integrated learning is constructed,which can detect whether the pages submitted by users are normal pages or malicious pages.The system also further verifies the feasibility and practicability of the malicious web page detection model proposed in this paper. |