Font Size: a A A

Research On Malicious Webpage And PDF Document Detection Based On SVM Model

Posted on:2015-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:S J YangFull Text:PDF
GTID:2298330467988807Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Internet brings people more convenient and faster information service thantraditional service. On the other hand, the openness and vulnerability of the network provideshacker convenience. Currently, among the various means of the network attack, the mostpopular is embedding exploit code in benign webpage, and then downloading the maliciousexecutable program automatically without users knowledge. And this means of attack hasposted a serious threat to the Internet security. The traditional anti-virus engine is hard todetect the obfuscated malicious code in Web page or PDF document, because of the staticsignatures can only match those readable and non-encrypted codes. Besides, the staticsignature database is increasing over time without endless. For this reason, It is promising tostudy a new detection technology for identifying malicious obfuscated code embedded in Webpage or PDF document.In this paper, the structure of web page and PDF document is analyzed firstly, then thesupport vector machine which is based on statistical learning theory is induced to train thefeatures of test samples for learning a classifying model. And the dynamic emulation tool isapplying to execute shellcode which may embedded in malicious JavaScript for analyzing itsspecific behaviors. By using above technology, the obfuscated malicious codes in web page orPDF document can be detected. The main work in this paper is as follows:(1) An overview of the attack and defense techniques of webpage Trojan are presented inthe chapter2. The Trojan s attack principle and typical attack means been introduced, as wellas the corresponding defense techniques, and indicates their advantages and disadvantages.(2) In order to overcome the weakness of the traditional anti-virus engine, this paperintroduce the SVM(support vector machine) based on statistical learning theory to detect thewebpage Trojan, instead of the traditional signature comparison approach. Specifically,extracting the suspicious JavaScript from the test sample firstly, then counting thosesuspicious characters in the extracted JavaScript for training the SVM specification. Finally,Applying the SVM classifer to divide the suspicious feature set into malicious type andbenign type.(3) The PDF analysis model is designed to analyze the stream objects in the PDFdocument. By applying static analysis technique, the suspicious JavaScript could be extractedfor the further detection with SVM classifier. (4) The dynamic emulation tool was introduced to help analyst learn more detailmalicious beheaviors about the shellcode where embedded in malicious JavaScript.
Keywords/Search Tags:Webpage Trojan, Support Vector Machine, PDF document, JavaScriptengine, Shellcode
PDF Full Text Request
Related items