Android OS brings great convenience to users,but the ensuing malware also poses a threat to users’ property and privacy,so an effective method to detect Android malware is necessary.In this research direction,traditional machine learning has improved feature processing capability but suffers from the problems of feature extraction relying on expert experience and low accuracy rate.Although traditional deep learning can automatically mine features,it suffers from the problems of information loss or redundancy during feature extraction and inadequate learning ability and perceptual field of view of neural networks facing complex spatio-temporal features when processing features.Meanwhile,in the direction of solving sample imbalance,the method of balancing data distribution by oversampling will have information redundancy,and the method of increasing the weight of small samples by superimposing control factors on the cross loss function is difficult to cope with the imbalance of multiple small samples with different degrees.To address the above problems,this paper proposes a feature preprocessing method and an optimized spatio-temporal dual-dimensional feature mining model,and proposes an improved Android malware detection and classification method based on the above two optimization methods,with the following main research:1)To address the problem that machine learning relies on expert experience too much and deep learning information loss or redundancy when extracting features,a feature preprocessing method called histogram generation based on augmenting three dimensions of streams is proposed,which generates a histogram of features corresponding to streams by extracting three features of time,size,and direction of packets in network streams,then augmenting traffic samples through a time sliding window,then using MTU as the image coordinate axis to determine the image dimensions,and finally filling each packet in a stream into the image according to its three-dimensional features.2)To address the problems of inadequate learning ability and perceptual field of view of neural network and redundancy of information in the data class and the difficulty of the algorithm class to cope with multiple small samples with different degrees of imbalance in the solution for sample imbalance,this paper proposes a convolutional neural network called DR-D TSECNN,which first uses a two-dimensional convolution with superimposed residual structure and null convolution to process spatial features,and then uses a TCN containing residual structure and null convolution to process the temporal features.Finally,the network uses Equalized Focal Loss to assign different weights to multiple small samples independently by adding a focus factor and a weight factor.3)This paper presents a comprehensive evaluation of the research in this paper based on three publicly available datasets.The experimental results show that: in terms of detection and classification,compared with the mainstream machine learning and deep learning,better results are achieved in accuracy and F-measure,with accuracy rates of 98.8%,99.1%,97.5%,and F-measure values of 99.1%,99.2%,96.4%,respectively;in terms of performance in sample imbalance,compared with the mainstream methods in The F-measure and G-mean achieved better results compared to the mainstream methods,with F-measure as shown before and G-mean values of 99.1%,98.3%,97.6%,respectively.This indicates that the method proposed in this paper can effectively detect and classify Android malware,and also can cope with the imbalance of multiple small samples with different degrees in Android malware traffic. |