Research And Implementation On Machine Learning-Based Detection Of Malicious Script Codes

Posted on:2012-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:H B Chen

Full Text:PDF

GTID:2298330467967374

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, the spread of malicious scripts codes is faster and faster, and the types are greatly increased. And the script’s compression and obfuscation technology became more and more popular, making the detection of malicious scripts to be more difficult. It’s a serious threat to Internet security.Presently, the virus detection methods are divided into static detection and dynamic detection. The static method detecting malicious features got by experts, has a fast recognition, high accuracy rat, but cannot identify unknown scripts. The dynamic detection is to run scripts in a virtual environment, according to the operation of identifying the behavior of the scripts, and it can identify new malicious script, but the detection efficiency is low. In this dissertation, we pay attention to JavaScript code, combined with static text analysis and dynamic JavaScript machine code analysis, and then use machine learning algorithms to analyze. The main contributions of the dissertation are summarized as following:Firstly, we proposed a method of identification confuse JavaScript code. Some malicious scripts use obfuscation techniques hiding its source codes, to avoid to be detected by rule-based anti-virus software. For the confused script, there is no good enough detection tool. This dissertation studies various types of script obfuscation methods, and then uses N-gram method and K-nearest neighborhood (KNN) classification algorithm to identify the confused script and non-confusing script. It is very important for malicious detection.Secondly, we proposed a method for detecting confusion malicious script. For the confusion JavaScript script, the text features have been hidden, thus it’s difficult to be analyzed with static method. This dissertation uses V8script engine to compile confusing JavaScript to machine code, gets the N-gram feature from machine code, and analyzes with KNN classification methods. And experimental results show that the method can effectively identify the confusing malicious script.Thirdly, we proposed a method for detection non-confusing malicious script. Use static method to get the feature vectors, including some characteristic functions, the system objects calls, entropy statistics. Using the Support Vector Machine (SVM) algorithms to train samples and build predictive models. And experimental results show that the method can identify whether a script is malicious.Last, we present a malicious detection system that can detect JavaScript contains confusing and malicious based on machine learning.

Keywords/Search Tags:

malicious script, code obfuscation, features extraction, n-gram, knn, svm

PDF Full Text Request

Related items

1	Clustering Analysis Of Malicious Code Based On N-gram Feature Extraction
2	Application Research Of Automatic Code Obfuscation Technology In Script Source Code Encryption
3	Research On Technology Of Software Protection And Malicious Code Detection Based On Code Obfuscation
4	Research On Malicious PDF Document Static Detection Technology Based On Improved N-gram
5	Research And Development Of Malicious Code Detection System Based On N-GRAM
6	Research On Features Extraction Method Of Attack Group Based On Malicious Code Gene
7	Research And Implementation Of Code Obfuscation In Java Software Protection
8	Research On A Novel Adaptive Anti-obfuscation Model For Detecting Malicious Code
9	Research Of Code Obfuscation Technology For Java Program
10	The Code Obfuscation System Based On Control Obfuscation And Layout Obfuscation