Font Size: a A A

Research On Malware Detection Based On Improved Information Gain And LDA

Posted on:2018-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330518499189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of information technology and the growing popularity of the Internet, more and more network information security problems have emerged, of which the more prominent is the issue of malicious software. The traditional malware detection method is mainly static detection method, which relies heavily on the characteristic code base. It is difficult to deal with new malware in the case that the number of malware is exploding today, and the static detection efficiency is getting low, which can not meet people's needs. The dynamic detection method of malware by capturing the behavior of Windows API call is the hotspot in the field of research. The process includes several key links, among which feature selection is one of the important links in the process of detection. Based on the dynamic detection method of malicious software and the key technology of API feature selection, this paper focuses on the research of malware detection.First of all, samples of malware and non-malware from professional websites and forums at home and abroad are collected in this thesis. In the dynamic monitoring environment,the WinAPIOverride tool is used to capture the API call behavior log of the sample software, and the API call name in the API call log are extracted, which is regarded as the basic features of malware detection.Then, taking the traditional information gain feature selection method as the research object, this paper analyzes the shortcoming of the traditional information gain feature selection method in the Malware detection: the word frequency and the distribution of the class are not considered. Aiming at these problems, a new method to improve the traditional information gain feature selection by introducing relative word frequency and class dispersion index is implemented in this thesis.The results show that the detection effect of malware based on improved information gain feature selection is better than that of traditional information gain feature by comparing the experiment with the detection effect of malicious software based on traditional information gain feature selection.Finally, considering that the feature selection method based on mathematical statistics may lead to the shortage of feature redundancy problems, a method of combining the improved information gain with LDA is proposed in this thesis. In the feature selection link,the improved information gain is used to carry on the initial dimensionality reduction, then LDA model is used to learn and further the distinguishing categories of the thematic features are extracted. The results show that the improved information gain and LDA can achieve better results by comparing the experiment with the improved information gain feature selection of malware detection.
Keywords/Search Tags:Windows API Call, Text Classify, Information gain, LDA
PDF Full Text Request
Related items