Font Size: a A A

Research On Malicious Web Page Detection Technology With The GAN Generation Technology

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:M X WanFull Text:PDF
GTID:2518306497966639Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Internet has brought various conveniences to people's lives,and at the same time,it caused a lot of information security problems.Malicious web pages disguised as normal web pages to illegally obtain user's information without the user's knowledge.It is the focus of the research of malicious webpage detection methods based on machine learning to analyze and extract malicious webpage features.There are objective differences between malicious web pages and positive web pages in URL texts,web contents and other aspects.We can extract these features and use them in machine learning classification algorithms to detect malicious webpages.The lifetime of malicious web pages is short and the means of attack are changeable.At the same time,the workload of webpage sample annotation is large,and there is no authoritative malicious webpage training data set.Therefore,the study of webpage sample expansion methods has a certain academic value and practical significance for the malicious webpage detection methods based on machine learning.The main research contents of this thesis are as follows:(1)The traditional features used in the malicious web page detection methods was summarized and introduced according to their sources.Then,the normative formulas features were defined and part of them were improved in this thesis.As shown in the experiments,compared with the classifier trained with a single type of features,the performance of the classifier trained with traditional features in this thesis is improved.(2)A new kind of text vector feature for malicious web page detection was designed,and a malicious web page detection method based on the traditional features and text vector features was proposed.Firstly,some key texts are extracted from the HTML document of the webpage sample.And then,they are converted into the feature vectors by means of text vectorization.Finally,they are used as the text vector feature of the webpage samples.The experimental results show that the text vector feature is not effective when used alone,but it can improve the malicious webpages detection performance to a certain extent by combining with the traditional webpage content features.And compared with the existing fusion schemes,the proposed malicious web page detection scheme has a great advantage in accuracy and F1 value.(3)This thesis proposed a malicious web page detection method by using GAN extended samples to overcome the difficulties of the samples collection and annotation.Then,in order to improve the quality of webpage samples generated by GAN,a generative adversarial network dedicated to the generation of webpage feature samples,which is called WFS-GAN,was designed and implemented.The WFS-GAN is based on the conditional generative adversarial networks,it is used the webpage's class label as conditional information and added the local feature discriminator to control the quality of the details of the category feature data corresponding to the generated samples.The WFS-GAN can generate positive and malicious two types of webpage feature samples,and the quality of the generated samples is improved.The experimental results demonstrate that the quality of the webpage feature samples generated by the WFS-GAN is better than the samples generated by the CGAN or CVAE.The malicious webpage detection classifiers recall value trained by the WFSGAN generated samples is higher than other ordinary classifiers.(4)A prototype system for malicious webpage detection was designed and implemented.The system is divided into the following three modules: the fusion feature extraction module is responsible for feature extraction of the web page;the webpage feature samples generation module is responsible for generating web feature samples using the WFS-GAN generator;the malicious web page detection module uses the expanded sample and the original sample to train the classifier to realize the detection of malicious web pages under the condition of a small number of real web page samples.The experimental results show that this prototype system performs well in detecting malicious web pages when using an appropriate number of extended samples.
Keywords/Search Tags:malicious web page detection, malicious web page feature, machine learning, generative adversarial network
PDF Full Text Request
Related items