Font Size: a A A

Research On Malicious Domain Detection Under Hadoop Environment

Posted on:2016-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhangFull Text:PDF
GTID:2348330479454692Subject:Computer technology
Abstract/Summary:PDF Full Text Request
DNS is an important basic services on the Internet. Domain names and IP addresses is mapped to each other by DNS, which makes it easier to access the Internet. Besides the normal network applications, a variety of attacks will depend on DNS, such as the Fast-Flux. Through the DNS traffic analysis, we can get abundant information about the normal and malicious activities, and thus the detection and the restraint of malicious domains will be easy.Velocity is essential to the restraint of the malicious activities, but the current methods used to detect malicious domains such as those based on blacklist(Blacklist), those based on data mining prediction, affected by the subjective judgment and the time spent on massive data analysis, tend to have a big lag to early hazard in malicious domain.Under such circumstances we need a method for malicious domain detection based on DNS traffic. The core idea of the method is the use of Big Data ecosystem through parallel processing tools, which reflects the performance benefits and precise classification strengths. It enhances the effectiveness and efficiency of malicious domain detection. By analyzing the malicious and benign domains in time series, DNS communications, TTL values and the domain name itself, the method extracts the properties which can distinguish malicious domains from benign ones efficiently, such as daily similarities, repeating patterns, the number of change points, digit percentage of domains. Moreover it can quickly and accurately classify unknown domains through parallel weighted random forest classification algorithm.After tests under the original parameters, tests under the optimized parameters and tests under improved models,we compared the accuracies, recalls and precisions under these circumstances, thus proving the performances of the classifier can benefit from the optimized parameters and the model improvements. Then we compared the performances of the programs under the M/R framework and sequencial environments to verify the time advantage of the M/R framework. Finally, we compare the model of our system, logistic regression model and Na?ve Bayesian model to prove that the proposed method can obtain more accurate results in a non-balanced data set.
Keywords/Search Tags:Big Data, Malicious Domains, Detection Algorithm
PDF Full Text Request
Related items