Font Size: a A A

Research And Implementation On Machine Learning-Based Detection Of Malicious Script Codes

Posted on:2012-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:H B ChenFull Text:PDF
GTID:2298330467967374Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the spread of malicious scripts codes is faster and faster, and the types are greatly increased. And the script’s compression and obfuscation technology became more and more popular, making the detection of malicious scripts to be more difficult. It’s a serious threat to Internet security.Presently, the virus detection methods are divided into static detection and dynamic detection. The static method detecting malicious features got by experts, has a fast recognition, high accuracy rat, but cannot identify unknown scripts. The dynamic detection is to run scripts in a virtual environment, according to the operation of identifying the behavior of the scripts, and it can identify new malicious script, but the detection efficiency is low. In this dissertation, we pay attention to JavaScript code, combined with static text analysis and dynamic JavaScript machine code analysis, and then use machine learning algorithms to analyze. The main contributions of the dissertation are summarized as following:Firstly, we proposed a method of identification confuse JavaScript code. Some malicious scripts use obfuscation techniques hiding its source codes, to avoid to be detected by rule-based anti-virus software. For the confused script, there is no good enough detection tool. This dissertation studies various types of script obfuscation methods, and then uses N-gram method and K-nearest neighborhood (KNN) classification algorithm to identify the confused script and non-confusing script. It is very important for malicious detection.Secondly, we proposed a method for detecting confusion malicious script. For the confusion JavaScript script, the text features have been hidden, thus it’s difficult to be analyzed with static method. This dissertation uses V8script engine to compile confusing JavaScript to machine code, gets the N-gram feature from machine code, and analyzes with KNN classification methods. And experimental results show that the method can effectively identify the confusing malicious script.Thirdly, we proposed a method for detection non-confusing malicious script. Use static method to get the feature vectors, including some characteristic functions, the system objects calls, entropy statistics. Using the Support Vector Machine (SVM) algorithms to train samples and build predictive models. And experimental results show that the method can identify whether a script is malicious.Last, we present a malicious detection system that can detect JavaScript contains confusing and malicious based on machine learning.
Keywords/Search Tags:malicious script, code obfuscation, features extraction, n-gram, knn, svm
PDF Full Text Request
Related items