Anomaly Detection Of JavaScript-based Malicious Web Pages

Posted on:2019-09-25

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H L Ma

Full Text:PDF

GTID:1368330551458098

Subject:Information security

Abstract/Summary:

PDF Full Text Request

Currently,web-based services and applications are widely available on the Internet.Every day there is a large number of users visiting the web.Under this circumstance,client security has become a very important issue.As the actual standard language of web front-end development,JavaScript provides the users with a variety of convenient services while bringing a lot of security threat to the users' terminals.If there exists vulnerabilities in the web client applications that users have never know,or in the case that the users have improper operations when visiting malicious web pages,then the malicious JavaScript code will cause seriously threat to the security of the client.A lot of new malicious web pages are generated on the Internet every day.Therefore,the detection of malicious web pages is always an important issue in research community.Summarizing existing research results based on usual attack techniques and detection technology,this thesis first introduces some existing attack techniques and methods from the view of the attacker,analyzes the characteristics of the related attack technology,and summarizes the related features of malicious JavaScript code with different attacks.From the view of the protector,the related detection methods are introduced,the advantages and disadvantages of various detection methods are analyzed,and the characteristics of the anomaly detection methods are introduced and analyzed,especially the semi-supervised anomaly detection method.From the point of view of data collection and anomaly detection,the advantages of the semi-supervised anomaly detection method are analyzed.Accordingly,a basic framework for anomaly detection of malicious web pages based on JavaScript is given.Based on this framework,the detection method and detection prototype system are proposed from three aspects,such as lightweight malicious web page anomaly detection,obfuscated drive-by-download attack detection and obfuscated malicious JavaScript code automatically de-obfuscation.We make the following contributions:(1)We proposed a lightweight method for anomaly detection of malicious web pages.The main idea is to replace a large number of complex and irregular JavaScript code with a small number of feature words.This greatly reduces the dimensionality of the data feature,and retains the execution of the code,logic and entropy.We Use the distribution of the feature words to detect malicious code.The detection method only uses static analysis,which is divided into data collection,data preprocessing,feature extraction,and detector.Data collection is to capture the content of web pages while data preprocessing is lexical analysis to the code after separating the JavaScript code from the page.Feature extraction is based on the distribution of feature words.There are two stages in the detector:training phase and testing phase.In training phase,only normal data is used.In testing phase,the principal component analysis(PCA),K-nearest neighbor(K-NN)and the one-class support vector machine(One-class SVM)algorithms are used to detect malicious web pages.20996 JavaScript-based pages in a real computing environment are collected.The extensive experimental results show that the detection system can achieve 90%detection rate in the case of 1%false alarm rate,meanwhile,the detection system can effectively detect 250 web pages per second,achieving the purpose of lightweight detection.(2)We proposed a method for detecting obfuscated drive-by-download attacks.The detection method combines static analysis and dynamic analysis,using static analysis to detect obfuscated JavaScript code and dynamic analysis to detect the attack behaviors in obfuscation.As to static analysis,it is assumed that the normal page does not use the obfuscated JavaScript code,or uses the obfuscation very seldom.Based on this assumption,we only use normal page code for training while using the distribution of the extended feature words to detect drive-by-download attacks.If the distance between the training data and the testing data is greater than the threshold,then the testing data is considered as obfuscation.As to dynamic analysis,nine features are constructed from the variable initial values and variable final values of JavaScript code to detect obfuscated drive-by-download attack behavior.70463 JavaScript-based pages in a real computing environment are collected.Extensive experimental results show that when the PCA algorithm is used,the static analysis method can achieve a detection rate of 99%with the false alarm rate is 0.1%.With the "variable hijacked" technology and the state machine model,the dynamic analysis method can real-time detect more than 80%of the obfuscated drive-by-download attack behavior,which can provide the specific attack behavior information of the malicious JavaScript code.(3)We proposed a method of detecting de-obfuscating obfuscated JavaScript code automatically and metrics-based evaluation of de-obfuscations.We conduct an in-depth analysis of a large number of obfuscated codes which are produced with obfuscation tools and techniques.A large number of common internal static behavior characteristics and external dynamic behavior characteristics of obfuscation are analyzed.The detection system combines the static analysis and the dynamic analysis.As to static analysis,weighted feature words distribution for the detection features.We assume that if a feature word is included in more train samples,the corresponding weight value of the feature word should be larger.One-class SVM,K-NN and PCA algorithms are used to detect obfuscated code.80574 JavaScript-based pages in a real computing environment are collected.A large number of experimental results show that when the PCA algorithm is used,the static analysis method can achieve a detection rate of 99.99%with the false alarm rate of 0.1%.As to dynamic analysis,there are two steps:firstly,travelling obfuscated code AST(Abstract Syntax Trees)nodes;secondly,analyzing the node type according to the type to get related variables final values,and then de-obfuscating obfuscation based on the values.We propose the edit distance,text feature similarity and Jaccard similarity for the evaluation of de-obfuscation.The extensive experimental results demonstrate that our method can automatically de-obfuscate vast majority of obfuscated malicious JavaScript codes,and evaluate de-obufscation effectly.

Keywords/Search Tags:

malicious Web page, anomaly detection, drive-by-download attacks, Web security, obfuscation, de-obfuscation, static analysis, dynamic analysis, JavaScript

PDF Full Text Request

Related items

1	Detection And Prevention Of Malicious Websites
2	Research On Code Obfuscation Oriented To Control Flow
3	JavaScript Obfuscation Detection Methods Based On CNN
4	Research And Implementation Of Software Protection Technology Based On Code Obfuscation
5	The Research And Implementation Of Web Malware Detection Based On Page Content
6	Research On Path Branch Obfuscation Technique
7	Research And Application On Indistinguishability Obfuscation
8	Research On JavaScript Imalicious Code Detection Technology Based On Machine Learning
9	Research On Design And Applications Of Cryptography Obfuscation
10	Research Into Malware Behavior Analysis And Obfuscation Detection Technology