As the smartphone operating system with the largest number of users,Android system has not only attracted many developers to provide rich application software,but it also attracted a large number of attackers to make profits by using malwares.To effectively protect user security,it has become an urgent problem in the field of mobile security how to detect Android malwares accuratly.A large number of studies use feature engineering methods to extract Android multi-dimensional features and use machine learning model to detect malwares.However,feature engineering methods often lose some implicit semantic information of features,which will lead to bottlenecks or worse detection effect,especially in the face of evolving malwares.Aiming at the problems of related research,this thesis proposed a model that could fully learn the multi feature implicit semantic information of malwares,which improved the detection effect of malwares.The work of this thesis mainly included three parts.First,an opcode feature extraction method based on convolutional neural network was proposed.In this thesis,a method of opcode visualization was designed.The opcode sequence of the control flow graph of the method was mapped into pixels,and the convolutional neural network was used to learn the feature representation of the obtained image.Secondly,the method of authority extraction based on neural graph was proposed.This thesis constructed a heterogeneous graph of permissions,APIs and applications.For the initialization of graph node features,the API embedding vector representation obtained from the Skip-Gram model was used as the part of API sub feature vector,and the GraphSAGE model was used to learn the rich semantic information in heterogeneous graphs.Thirdly,a detection model based on multi feature implicit semantic mining was proposed.A heterogeneous graph of 2-Gram operation code,API,permission and application was constructed with the integration of the implicit semantic extraction methods of operation codes,APIs and permissions.The graph node features were combined with the visually extracted features of operation code,API embedding vector and permissions.Then the implicit semantics in the heterogeneous graph was fully mined with GraphSAGE model to achieve better detection effect.In this thesis,the replaceable structures in the above methods and models were compared longitudinally to obtain the best model structure.The results show that the feature mining methods and the detection model proposed in this thesis have better performance.The results of this thesis have certain reference significance for feature learning methods and model design in the field of Android malware detection. |