Font Size: a A A

Research On Malicious Web Page Recognition Based On Feature Fusion And Machine Learning

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:X WeiFull Text:PDF
GTID:2428330614963977Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of information technology enriches people's life and meets people's various needs.However,while bringing all kinds of convenience,it also makes web pages more and more complex and gives birth to many malicious web pages.Malicious web page is a kind of web page that infringes the user's security,including personal privacy and property,by using the web page vulnerability without the user's knowledge.The breeding of malicious web pages destroys the harmonious cyberspace and even endangers social and national security.These attacks are more and more complex,the rapid development of the Internet is also the emergence of malicious web pages in endlessly,It also increases the difficulty of web page recognition.This paper analyzes the traditional malicious web page recognition method,proposes a new web page feature based on HTTP request,and integrates the features of traditional web page URL,Java Script code and HTML code to identify the malicious web page through machine learning classification algorithm.The feature selection method based on information gain is improved to improve the accuracy of classification algorithm.A malicious web page recognition system based on feature fusion and machine learning is designed and implemented.The main work and achievements are as follows:(1)Based on the analysis of the traditional web page features of some malicious web pages:Web page URL,Java Script code and HTML code,some features are defined for malicious web page recognition;and some features based on HTTP request information are proposed;the HTTP request features are integrated with the traditional web page features,and then the machine learning classification algorithm is used to build a web page classification model to distinguish normal and malicious web pages.Experiments show that the method based on URL feature,web code feature and HTTP request feature is better than the method without HTTP request feature,and the random forest classification algorithm is more suitable for malicious web page recognition.(2)This paper focuses on the traditional feature selection methods,analyzes the advantages and disadvantages of the traditional methods,and proposes an improved algorithm.The improvement focuses on solving the problem of too much correlation between features and each category and the negative correlation value.So it can improve the effectiveness of feature selection and the performance of classification model.(3)In view of the above proposed malicious web page identification method,an extensionprogram is implemented based on chrome extensions.As a browser plug-in,it can detect whether the web page visited by users is a normal web page or a malicious web page in real time.The front end is responsible for web monitoring and data collection,and the back end is responsible for data crawling and prediction.In addition,the feasibility and accuracy of the program are verified.
Keywords/Search Tags:Malicious Web Pages, Web Security, Web Features, HTTP Requests, Machine Learning, Feature Selection
PDF Full Text Request
Related items