Key Technologies Of Malicious Webpage Detection For Feature Hiding

Posted on:2024-04-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Qin

Full Text:PDF

GTID:1528307310482264

Subject:Communication systems and information security

Abstract/Summary:

PDF Full Text Request

Currently,malicious webpages have become one of the important network security threats faced by Internet users.After inadvertently accessing a malicious webpage,attackers can gain control of the user’s terminal by exploiting vulnerabilities in the browser or its plugins,and use it as a springboard to further illegally obtain information,spread backdoors or ransomware,etc.,posing a huge security threat to cyberspace.Attackers usually use a large number of redirects to guide users to malicious web pages and ultimately execute malicious code to launch attacks.Therefore,the redirect features and executable code features of webpages are important detection criteria for dealing with malicious webpages,but attackers usually hide these features to counter detection.This paper focuses on the problem of adversarial detection techniques that are difficult to deal with in existing malicious webpage detection methods and conducts in-depth research on key technologies for detecting malicious webpages with hidden features.The following research results have been achieved:(1)In the extraction of web page redirection behavior,static analysis methods can only restore HTTP redirection and cannot observe redirection caused by Java Script,Flash,etc.Dynamic analysis methods can more effectively restore redirection,but the huge detection overhead makes it difficult to apply to many scenarios such as real-time detection.In this paper,we propose a method called HMG that relies solely on static analysis to extract information from network traffic to detect malicious webpages.This method establishes a graph model of the web session generated when the user accesses the malicious webpage.In view of the characteristics of more Java Script redirects used in malicious web pages,we propose an isolated nodes feature to measure the number of redirects that cannot be observed.At the same time,combined with graph structure and other features,we use a random forest model for classification.The experiment proves that our method can achieve a precision rate of 95.7% and a recall rate of 93.4% on a real dataset,which is an increase of 12.1% and 8.4%respectively compared to existing methods.In addition,this method can also directly identify malicious web pages buried in the interaction process with an accuracy rate of 97.7%,which can effectively assist security experts in manual analysis.(2)In order to evade detection,attackers try to reduce the number of redirects,making the interaction graphs generated when accessing malicious webpages and benign webpages structurally similar,which reduces the performance of existing detection methods based on redirect features and graph structure features.In this paper,we propose a detection method called P-HMG based on graph neural networks.Based on the difference in external resource types loaded by malicious webpages and benign webpages,this method uses the number of times each type of external resource is loaded as the feature of each webpage.By adding potential edges to isolated nodes,we successfully perform graph representation learning on incomplete graphs,and combine statistical features extracted by HMG to train the model for detection.The experiment verifies that P-HMG has achieved improvements in all indicators compared to existing methods,with an accuracy rate of 97.58%,a precision rate of 97.36%,a recall rate of 97.51%,and an F1 score of97.42%.(3)As a commonly used executable code in web,Java Script has become the most common attack carrier in malicious webpages.In order to hide the executable code features,attackers usually use code obfuscation techniques to process attack scripts.After code obfuscation,the script volume will expand,and existing deep learning-based detection methods are difficult to input the complete code representation sequence into the model.In this paper,we propose a detection method called Zip AST based on sequence compression.This method targets the characteristics that code obfuscation will generate a large number of duplicate fragments and that abstract syntax trees may use multiple atomic operations to express a code behavior.By using new tags to replace high-frequency fragments,the input sequence length is compressed without affecting the original semantics.The verification results on public datasets show that Zip AST can achieve a compression ratio of more than 12.9:1.When the input sequence length is 1,000,Zip AST achieves detection accuracies of 96.10%,96.25%,and95.40% on the original(unobfuscated)dataset and two obfuscated datasets,respectively,compared to existing methods,which are increased by 8.45%,26.35%,and 19.05%,respectively,demonstrating the effectiveness of this method.(4)Code obfuscation techniques not only increase the size of the script but also change the original code structure.The structural information of the code is crucial for representing code behavior,making it difficult for existing methods to effectively detect obfuscated malicious scripts.In this paper,we propose a detection method called Trans AST based on machine translation.This method targets the popular obfuscation tool javascript-obfuscator,which has fixed templates in the process of generating obfuscated code,making the pre-obfuscated and postobfuscated code have a mapping relationship in structure.By training a machine translation model,Trans AST can restore the structural information of obfuscated code.The experiment verifies that the proposed method achieves at least a 6.58% improvement in accuracy,a 6.83%improvement in precision,a 5.94% improvement in recall,and a 6.82%improvement in F1 score compared to existing state-of-the-art methods.Additionally,robustness experiments were conducted on different obfuscation tools,and the results demonstrate that the proposed method is equally effective for other types of obfuscation tools.

Keywords/Search Tags:

Malicious Webpage Detection, Drive-by Download Attack, Feature Hiding, Redirect, Code Obfuscation, Machine Learning

PDF Full Text Request

Related items

1	The Research And Implementation Of Web Malware Detection Based On Page Content
2	Anomaly Detection Of JavaScript-based Malicious Web Pages
3	Research And Implementation Of Obfuscated Drive By Download Attack Detection Technology
4	Research On Technology Of Software Protection And Malicious Code Detection Based On Code Obfuscation
5	Research On Drive-by Download Detection Based On Machine Learning
6	Research And Implementation On Machine Learning-Based Detection Of Malicious Script Codes
7	Research On JavaScript Imalicious Code Detection Technology Based On Machine Learning
8	Malicious Web Page Detection System Based On Classification Algorithm
9	Malicious Webpage Detection System Based On Script Engine
10	Research On Malicious Code Detection Based On Feature Fusion And Machine Learning