Font Size: a A A

Research On Incremental Clustering And Indexing Of Malware

Posted on:2017-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330569498856Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Today,there has been an exponential trend of substantial growth of the number of malicious codes.The vast majority of anti-virus vendors choose to take advantage of machine learning,data management and other methods to establish an automated assembly line of malicious code analysis system which can complete the identification and analysis of a large number of unknown binary samples.As a key part of the malicious code analysis system,the research of incremental clustering and retrieval technology of malicious code is of great significance in practical application.This thesis launched a series of research work based on the practical application requirements of malicious code analysis and actual malicious code dataset.The main contributions of this thesis are:(1)The feature selection in the clustering of malicious code and the combination of multiple features are systematically discussed in this work.Common features used for clustering analysis have their own advantages and disadvantages,but multi-feature combination to a certain extent makes different features complement with each other.The results of experiments show that the output of clustering can be obtained better than the single feature by using multi-feature combination.(2)An incremental clustering algorithm GLIC based on grid density and linear discriminant analysis for malicious code is proposed for the first time.GLIC algorithm combines the grid density with linear discriminant analysis,which can effectively reduce the dimension of high dimensional samples.The algorithm can update the clusters of malicious codes in real time.The experimental results based on the actual malicious code data show that this algorithm has the advantage of small memory and high time efficiency.(3)A fast function matching algorithm C2 FM based on control flow graph and call graph for malicious codes is proposed in this thesis.C2 FM algorithm is actually used for malicious code retrieval based on the results of malicious code clustering.The control flow graph and call graph of functions are used to complete the function matching between different samples,which avoids the similarity calculation for all function pairs.Compared with other algorithms,the experiment shows that the algorithm has better time efficiency and its matching accuracy can reach more than 93%.
Keywords/Search Tags:Malicious Code, Feature Selection, Incremental Clustering, Retrieval Technology, Function Matching
PDF Full Text Request
Related items