One of the most severe threats to cyber security is botnet,which typically uses a Command and Control(C&C)servers to command the bots to launch multiple cyber attacks.The short-lived domains generated by Domain Generation Algorithm(DGA)are often used by attackers to communicate with their C&C infrastructure.To prevent the abuse of the malicious DGA domains,DGA detection and classification plays an important role to assist cyber security researchers to take down the botnet C&C servers.DGA detection methods have been the research focus of information security and many other fields in computer science.In recent years,machine learning methods have been widely applied to DGA detection and classification.Especially with the rapid development of deep learning,DGA detection algorithms based on deep learning have made some progress.However,these methods based on deep learning are composed of relatively simple models,so they have limited ability to extract useful features.When it comes to multiclass classification tasks of DGA,these methods can still be improved in terms of classification accuracy.This paper starts with the word embedding-based DGA detection algorithm with deep learning methods,studies multiple aspects of the task,and proposes new DGA detection deep learning algorithms to improve the accuracy of DGA detection and classification.The main research contents of this article are as follows:(1)Research on DGA domain name detection algorithm based on deep learning models with hybrid word embedding.A mixed word embedding method is designed,based on character level embedding and bigram level embedding,to improve the information utilization of domain names.The paper also designs a deep learning model using the mixed word embedding method.At the end of the paper,an experiment with multiple comparison models was conducted to test the model.The experiment results show that the model based on the mixed word embedding achieves better performance in DGA domain name detection and multi-class classification tasks compared with models based on character level embedding,especially in the small DGA families with few samples.The results show the proposed approach is effective.(2)Research on DGA domain name detection algorithm using Transformer network with hybrid word embedding.A modified Transformer network is designed to improve the ability of extracting effective domain name sequence features.This paper designs a deep learning model based on Transformer network with hybrid embedding method to distinguish DGA domains from known legitimate domains.Finally,DGA domain name detection and classification comparison experiments with multiple models are designed.Experiments are performed on the OSINT and Alexa public datasets.The proposed algorithm is compared with cutting-edge DGA domain name detection and classification algorithms.The results show the proposed approach is effective.This paper has 14 figures,10 tables and 103 references. |