Font Size: a A A

Design And Implemention Of Web Trojan Detection System Based On Data Mining And Machine Learning

Posted on:2015-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y ShiFull Text:PDF
GTID:2308330473952545Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Computer networks are changing the way people live, but because of the open nature of the network, connectivity and other features, resulting in a network prone to attack criminals, which makes network security attract more and more attention. Among them, the Trojan has been called the number one killer of network security. virus, hacking, server crashes and other safety issues are all caused by the Trojans as the carrier. The traditional detection methods based on pattern matching detection system is currently the most used security method, which relies on manual analysis to extract and can not predict the unknown malicious code. If malicious code is confused or distorted, it can not do anything. Data mining and machine learning is currently a hot research field, the combination of whic two technologies to detect Trojan is the future trend of development studies. This article is based on the above issues that design and implement Trojan detection system against malicious JavaScript. The main contents of this paper include:1. Firstly, we introduce the main principles and theoretical knowledge of data mining and machine learning techniques; and then, summarizes the current domestic and international mainstream web-trojan detection algorithms have emerged, and analyzes the advantages and disadvantages of each algorithm.2. Currently, most of the Trojan will embed malicious JavaScript script code in the page. Therefore, this article focuses researching on detection of malicious JavaScript. To evade anti-virus software to be detected, malicious code is often obfuscated or deformed, so conventional detection techniques for feature matching is largely ineffective.We compile malicious JavaScript generating machine code using google v8 engine and extract opcode from machine instruction using n-gram technologies. Most frequently 200 gram as the difference between normal and malicious scripts script Trojan features.3. In this paper, we use web crawler to collect 100 normal and 100 malicious scripts from the Internet as a Trojan sample set. Then BP neural network ensemble classifier model is trained by sample data, using a 4-fold cross-validation method to analyze the accuracy and the correct rate of the detection method.4. Finally, we design and implement a prototype system Trojan detection using VC ++6.0, mysql and other tools or techniques. The system includes a web crawler, feature extraction module, BP neural network ensemble classifier module and so on.
Keywords/Search Tags:web-Trojan, JavaScript, machine learning, data mining, integrated classifier
PDF Full Text Request
Related items