Font Size: a A A

Research On Detection Algorithm For Malicious Word And PDF Documents

Posted on:2018-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:X D TianFull Text:PDF
GTID:2348330518499445Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks,more and more people begin to focus on protecting their personal privacy and important data.However,the emergence of a variety of malicious documents has brought great harm to the people's life,especially Microsoft Word and PDF documents which are editing and viewing software often be used.They become the target,and malicious attacks which use the defects of document come out one after another,the number of attacks increases dramatically and bring irreversible losses to users.As a result,if we can design a malware detection algorithm for suspicious documents,the harm of malicious Word and PDF documents will be relieved greatly.Focus on the above problems,this thesis introduces some security background and common attacks of malicious Word and PDF document,then we describe the newly research situation of document detection.The shortcoming of known static detection is the low accuracy while dynamic detection is the long detecting time.Machine Learning has a powerful ability to learn from data,and can get the hidden statistical rules,thus,more and more security researchers try to use Machine Learning in malware detection.Based on the existing research,this thesis proposes two faster and more effective algorithms using Machine Learning:1)Dynamic Detection of Malicious Word and PDF Based on API Behavior and Deep Learning Model Inception V3Sandbox technology is a typical dynamic technology most commonly used,but it is based on time overhead and virtualization instruction system.Using the improved Cuckoo sandbox,this thesis designs a malicious documents detection algorithm based on Deep Learning model Goog Le Net Inception V3.The results of Cuckoo Sandbox with documents running in it are abstracted according to API dependency,then transfer the document feature vector to two-dimensional image.The Inception V3 network will extract the Bottleneck feature while image is inputed,then train the classifier using transfer learning and the detection is ended.Experiments shows that this detection algorithm has achieved a good time performance in unknown malicious Word and PDF,and the detection rate has reached 89.1%.2)Static Detection of Malicious PDF Based on K-means Cluster and Deep Text Feature Detection NetworkTraditional Static detection of PDF is generally aimed at a specific attack,and the detection rate is too low.In view of these problems,this thesis designs a static detection algorithm for PDF,the algorithm includes two aspects: the extraction of the distinguishing text features based on K-means;the classification based on the deep text feature detection network.The extraction uses PDFMiner and K-means clustering to get the distinguishing text feature between malware and benign,meanwhile,the deep text feature detection network is a designed 15-layer deep linear neural network.Experiments show that this detection algorithm achieves a good result on unknown malicious PDF document and the detection rate has reached 86.6%,at the same time,it can also detect malicious PDF under different attacks effectively.
Keywords/Search Tags:Malicious Document, Tensor Flow, Deep Learning, Static Analysis, Dynamic Analysis
PDF Full Text Request
Related items