Font Size: a A A

Research And Implementation Of Malicious Domains Detection Technology Based On Big Data Analysis

Posted on:2019-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:C X YinFull Text:PDF
GTID:2348330545955598Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cyber security is a topic we can never evade.Criminals often use domains as a means of spreading Internet attacks on the Internet,such as connecting to Trojans and botnet communication.Some techniques such as Fast-Flux and DGA make network attacks more hidden and malicious domain names harder to recognize.Domain blacklist strategy is very difficult to play a role in this case,and it is a more efficient way to identify malicious domain by analyzing the DNS data of domain.This paper first investigates the related techniques of malicious domain detection,analyzes the difficulties faced by malicious domain detection,summarizes the existing technical solutions and related research results,studies the machine learning classification model and big data technology,and uses Hadoop,Spark,Kafka,etc.to set up a big data analytics infrastructure.On this basis,this paper starts with the DNS data and constructs a malicious domain detection model based on DNS behavior features by using machine learning.This paper analyzes the statistical distribution of DNS data from multiple perspectives and extracts 22 features of four dimensions.Cross-validation is used to compare two classification models of random forest and GBDT.The tests show that random forest has advantages in accuracy,recall and other indicators.Finally,a malicious domain detection system is designed and implemented based on big data platform,and the constructed detection model is used in the system.The system architecture to be designed takes a series of questions into consideration,such as input source,data storage,execution efficiency and scalability,finally it is divided into 4 functional modules.In order to ensure that the system can be stable and available in high-speed networks,many performance optimization solutions have been adopted.Use network traffic diversion model to improve high-speed network traffic capture ability.Optimize Kafka configuration to deal with the short-term surge in network traffic and improve system throughput and stability.Whitelist filtering will filter data of popular legal domain name that account for the majority of DNS data,thereby reducing the follow-up module data processing pressure.Preprocessing module aggregates domain information and writes results to MongoDB at regular time to reduce the HDFS data repeatedly read and processed,and so on.The model designed and the system implemented in this paper are deployed in the actual network for detecting malicious domain online.After testing,the system has achieved a good detection accuracy and detection efficiency.
Keywords/Search Tags:malicious domains, machine learning, big data, online detection
PDF Full Text Request
Related items