| Malware illegally invades user equipment and systems,steals user privacy information,which poses severe challenges to the information security and property security of users and society.Malware detection has become an issue of great concern to security companies and researchers.In order to detect malware more accurately,this paper proposes a series of new malware detection methods based on the Windows platform.The specific work and contributions of this paper are as follows:(1).Aiming at the binary byte features of malware,a malware detection method based on multi-byte frequency domain visualization is proposed.Calculate the transmission frequency between the front and back bytes in assembly instructions of different lengths through multilevel Markov frequency,construct a multi-level Markov image in the frequency domain,and perform dimensionality reduction through PCA.This method fully solves the impact of malware semantic factors on the accuracy that is not considered in the traditional image texture feature research,at the same time solves the problem of information loss caused by image size normalization before deep learning training.The experimental results of the DCNN classifier show that the classification accuracy of the multi-order Markov image obtained by multi-byte frequency domain visualization is significantly improved.(2).For the malware assembly code obtained through disassembly,a feature selection method is proposed,which uses Gini impurity gain and TF-ICF,named ΔGini-TFICF.The method improves the detection performance of malware.Firstly,the traditional TF-IDF algorithm is improved,and TF-ICF is proposed,which highlights the distribution of words in the sample in the same family and different families.And through the distribution of malware assembly instructions in the same and different families and the expected error rate,to select the assembly instructions and classify the malware.Experimental results show that the subset of malware features obtained by this method can effectively improve the classification accuracy of machine learning algorithms.At the same time,it also provides a new research idea for feature selection of malware.(3).For the dynamically obtained malware API call information,a dynamic malware detection method based on graph representation learning is proposed.Using the API name,API type,and some arguments of functions as features.API call sequence representing the API name is vectorized by Word2 Vec.Then,the API type and arguments are feature-encoded and featurevectorized by Sorensen-Dice.Build an API call sequence diagram with the API name as the node information and the API type and arguments as the side information.Obtain the topological associations of "API call sequence-API argument" and "API call sequence-API call sequence".Construct a deep learning model that combines the improved graph isomorphic network and graph attention mechanism network.Compared with other methods,the graph representation learning method proposed in this paper greatly improves the accuracy of malware classification and detection.It is proved that the method has higher applicability in the dynamic detection of malware. |