Research On Malicious Web Page Recognition Based On Hybrid Feature Selection And Subsampling Multilayer Integrated Learning

Posted on:2024-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:X W Yu

Full Text:PDF

GTID:2558307127960609

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

While the Internet brings users rich resources and convenient services,it has also become a platform for network attacks due to its openness and anonymity.In many network security problems,malicious web pages play an important role and pose many threats to the security of network users.How to effectively detect malicious web pages has become a key research topic in the field of network security.This paper analyzes the traditional malicious web page recognition methods.In order to improve the detection effect,a hybrid feature selection method based on improved Relief feature filtering and PCA dimension reduction is proposed,and a training method of multi-layer integrated learning model based on undersampling is proposed.Based on the above methods,an online malicious web page detection system based on machine learning is designed and implemented.The main research contents of this paper are as follows:(1)The static feature selection method of malicious web page detection is studied.In the process of machine learning algorithm detecting malicious web pages,if we only pursue multiple features,it will lead to dimension disaster and feature redundancy.Considering the importance of features,the impact of unbalanced data sets,and the correlation between features,this paper proposes a hybrid feature selection algorithm to effectively select malicious web features based on Relief feature selection algorithm,which is improved by under sampling and combined with PCA dimension reduction algorithm,and solves the problem that Relief feature selection algorithm performs poorly on unbalanced data sets and does not consider the correlation between features.(2)The method of building malicious web page detection model under unbalanced data set is studied.In order to solve the problem that the detection results of malicious web pages are biased towards the majority due to the serious imbalance between the proportion of malicious web pages and benign web pages,a malicious web page detection model based on the combination of undersampling,cost sensitive learning and multi-layer integrated learning was proposed.Local data balance is achieved by undersampling;Integrated learning based on cost sensitive learning ensures the integrity of global information.The experimental results show that the performance of the multi-layer integrated learning model based on undersampling is better than that of the traditional machine learning model in identifying malicious web pages on unbalanced data sets.(3)In view of the above research content,an online malicious page detection website based on hybrid feature selection algorithm and undersampling multi-layer integrated learning is constructed,which can detect whether the pages submitted by users are normal pages or malicious pages.The system also further verifies the feasibility and practicability of the malicious web page detection model proposed in this paper.

Keywords/Search Tags:

Malicious Web pages, Undersampling, Feature selection, Unbalanced data, Multi-layer integrated learning

PDF Full Text Request

Related items

1	Application Research Of Unbalanced Data Classification Algorithm Based On Integrated Learning
2	Research On Malicious Web Page Recognition Based On Feature Fusion And Machine Learning
3	The Research And Implementation Of Malicious Web Pages Detection
4	Research On Unbalanced Data Classification Based On Ensemble Learning
5	Research On High-dimensional Unbalanced Data Classification Algorithm Based On Feature Selection And Ensemble Learning
6	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
7	Research On Static Detection Method For Android Malicious Application
8	Research On Under-sampling Classification Method Of Unbalanced Data
9	Research On Classification Algorithm For Unbalanced Data
10	Feature Selection For Unbalanced Data And Emotional Dictionary Building