Font Size: a A A

Malware Identification Based On Fuzzy KNN And Visualization Analysis

Posted on:2020-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:M D TangFull Text:PDF
GTID:2428330599964891Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has brought great convenience to the society,but the computer security issues have followed well.Malware is one of the important threats.The high-risk of malware has caused huge threat to individuals,organization,finance,military and even the country.Therefore,the study of malware has always been a research hotspot in the field of computer security.And with the rapid development of malware automatic generation technology and obfuscation technology,the types and quantities of malware have exploded.Traditional detection methods based on signature matching cannot meet the new requirements for security under the new situation.Machine learning and deep learning play an increasingly important role in the research of malware.The main work of this thesis includes:1,The Fuzzy KNN(FKNN)algorithm is proposed by combiling fuzzy set theory and KNN algorithm,and the algorithm is applied to malware identification.In the feature extraction phase,we first extracts the P E file structure information of the malware through static analysis,and then uses the fuzzy set theory to generate the fuzzy vector of the malware.Use the "maximum fuzzy region matching principle" to filter the interference of outliers,and calculate the Euclidean distance between the fuzzy vectors to find the k-nearest neighbors.In the classification phase,the reciprocal of index is assigned as weighting vote,which can better deal with the unbalanced data set.Finally,the class with the sum of the largest voting weights is used as the prediction label.By verifying on the public dataset ClaMP,the FKNN algorithm achieves a accuracy of 0.952,a recall of 0.977 and an AUC of 0.99,which is superior to Classical KNN(CKNN),Local Mean KNN,SVM and other comparison algorithms.2,A dynamic API call sequence visualization method is proposed,and combined with deep learning to complete malware classification.This method takes into account factors such as the type,time and frequency of the API being called during dynamic operation,and generates a feature image that reflects the behavior pattern of the malware.The convolutional neural network(CNN)is used to learn and classify the feature images,thereby indirectly achieving the purpose of malware classification.Experiments show that the method achieves a classification accuracy of 0.993,a recall rate of 0.993,and a FPR of 0.00085 in the classification experiment of 9 types of malware families.And with the increase of test samples,the time consume of classification phase is still maintained at the millisecond level,with high accuracy and high efficiency.
Keywords/Search Tags:Malware, k-nearest neighbors, visualization, deep learning, static analysis, API call sequence
PDF Full Text Request
Related items