Font Size: a A A

The Research On Detecting Malware Based On Opcode N-gram And Machine Learning

Posted on:2018-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:P F LiFull Text:PDF
GTID:2348330518993314Subject:Information security
Abstract/Summary:PDF Full Text Request
Now the computer and communications infrastructure is very vulnerable to different types of attacks. Malware (also known as malicious code) is the name of these different types of attacks, including but not limited to worms, viruses and Trojans. Malicious software not only spreads quickly, but it also damages the interests of individuals,business companies and governments. In recent years, the rapid development of network connectivity has allowed malware to spread and infect host at a faster rate. Therefore, it is very necessary to detect and eliminate new malware in a rapid manner. In order to protect legitimate users from attack, the most important way is to install anti-virus software products. These antivirus software mainly uses signature-based malware detection. The signature-based approach is an extremely heavy job, it is not only time consuming, costly, and most important is the need for experts. At the same time, the signature-based approach is limited to recognizing only known malware, and it can not reliably and effectively identify new malware. In order to solve the above problems, the researchers proposed a heuristic detection method. Heuristic methods primarily use data mining and machine learning to learn a particular pattern that can characterize malware. Generally speaking, heuristic detection process can be divided into two stages: feature extraction and classification.To this end, this paper studies the related technology about malicious code, packing and unpacking, disassembly, feature selection and classifier. This paper designs and implements a malicious program detection method and system based on operation code sequence and machine learning. First collect a large number of malicious samples and normal samples, after checking the packers, select the appropriate sample for disassembly, and then extracts the corresponding opcode sequence.And then compare the two feature selection method between information gain and Categorical Proportional Difference. Finally train and test the classifier. In this paper, a series of experiments and analyzes are carried out with a new perspective: when the size of opcode sequence is small(less than 8), the percentage of malicious files is 50%, the algorithm of feature selection is information gain and the training set is larger, the effect of malware detection is better.
Keywords/Search Tags:machine learning, opcode sequence, malware detection
PDF Full Text Request
Related items