| With the continuous development of artificial intelligence technology,machine learning and deep learning have made significant breakthroughs in fields such as image recognition,speech synthesis,and text translation,and the fields of network and information security have also benefited greatly from them.Malware is one of the most common network attacks,where attackers engage in malicious activities in computer systems.In order to solve the problems of low accuracy,insufficient feature extraction,and small sample size in malware classification models.This thesis uses static analysis technology to classify and detect malicious software based on machine learning algorithms,providing more effective solutions for malicious software classification models and avoiding attacks on computer systems.The specific research work is as follows.1)This thesis proposed a multi-classification detection technology for malicious software based on N-Gram.Firstly,N-Gram method was used to extract byte sequence with length of 2 from malware samples.Secondly,the multi-classification model of malware was trained based on machine learning algorithms such as KNN,random forest,XGBoost,etc.Then,confusion matrix and logarithmic loss function were used to evaluate the multi-classification model of malware.Finally,the malware multi-classification model was trained and tested on the BIG2015 dataset,mainly classifying 10 868 BYTE files in the training data for malware family classification.The experimental results of this technology show that the accuracy of the malware multi-classification model based on random forest and XGBoost is 97.93% and 98.43% respectively,and the Log Loss is0.026 946 and 0.022 240 respectively.2)This thesis proposed a malicious software binary classification detection technology based on SE-Res Net.Firstly,48 651 samples in Windows PE format were collected to construct a malware dataset,which was defined as VT2022.API of samples extracted through Virus Total and integrated into 1 054 non-repeating APIs as malicious features for the training model.Secondly,Boolean vectors were used to label malicious features in both malicious and normal samples,and the SMOTE algorithm was used to solve the problem of imbalanced sample categories in the dataset.A malicious software binary classification model based on SE-Res Net is constructed to detect samples in the dataset.Finally,the effectiveness of the SE-Res Net method on the VT2022 dataset was evaluated,and the impact of parameter r in the SENet model on experimental results was studied.The experimental results show that the accuracy of the SE-Res Net based malicious software binary classification detection model is 97.16%,and the accuracy during data balancing is 99.58%.3)This thesis proposed a malware detection technology based on Auto ML.In the Linux system,first in the data stage,experimental predictions were made using sample information from the VT2022 dataset to detect malicious and normal samples.Secondly,in the modeling phase,a malicious software detection model was constructed based on the table prediction model in Auto Gluon.During the experimental process,a total of 12 detection models were trained and automatically provided the best model for the given dataset.Finally,during the deployment phase,the test data was loaded into the optimal model to predict the sample category,and the predicted results were compared with the actual results to calculate the accuracy of the detection model.The experimental results show that the prediction accuracy of the malware detection model based on Weighted Ensemble_L2 is as high as 99.98%,and the testing accuracy is 98.36%. |