Font Size: a A A

HTTP-Based Botnet Detection Using Network Traffic Traces

Posted on:2016-03-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:TRUONG DINH TUFull Text:PDF
GTID:1108330491463138Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Botnets are generally recognized as one of the most serious threats on the Internet today, because they serve as platforms for the vast majority of large-scale and coordinated cyber-attacks, such as distributed denial of service, spamming, and information stolen. Detecting botnet is therefore of great importance and some security researchers have concerned about this threat and proposed many effective botnet detection approaches.However, botnet developers are constantly developing new techniques in order to improve their bot and avoid the detection from security researchers. In recent years, HTTP-based botnets have become more widespread and caused enormous damage to many government organizations and industries. New generation HTTP botnets tend to use techniques called DGA (Domain Generation Algorithmically), domain-flux, or fast-flux to avoid the detection. Some botnets use the domain-flux technique to evade from being blacklisted; some botnets use the fast-flux technique to hide the true location of their servers.Therefore, the main research objective of this dissertation is to build solutions for detecting HTTP botnets that attackers often use techniques such as DGA, domain-flux or fast-flux to evade the detection. To achieve these goals, the dissertation solves three main problems:(1) To detect the presence of domain-flux or DGA-based botnets infected machines inside an enterprise network or the monitored network; (2) To detect C&C servers of botnets using domain-flux or DGA-based evasion techniques; (3) To detect malicious Fast-Flux Service Networks (FFSNs). The main contents of these three research works are summarized as follows:The first problem is how to identify the presence of domain-flux or DGA-based botnets infected machines inside the enterprise network or the monitored network. To answer this question, multiple well-known domain-flux or DGA-based botnet samples are collected, such as Kraken, Zeus, Conficker, Bobax and Murofet botnets. Then, we execute these bot samples in a virtual machine environment to obtain network traffic traces. Through examining and analyzing on the large number of network traffic traces, we discover that these botnets exhibit many similar periodic behaviors in querying to domain names. In addition, the evidence from this study shows that the domain-flux or DGA-based botnet infected machines often query a large number of the non-existent domain names with similar periodic time interval series to look for their C&C server. The normal legitimate hosts have no reason to query a large number of different domain names with the similar periodic time interval series to yield high volumes of NX-Domains replies. This similar behavior only occurs with the domain-flux or DGA-based botnet infected hosts. Therefore, based on these characteristics, we propose a method based on analyzing correlation between each pair of time intervals series of queries to cluster the similarity of domain names. The experiment results show that the domain names are generated by the same botnet or DGA are grouped into the same clusters. The lists of hosts that tried to query to clusters of these domains are marked as compromised hosts running a given domain-flux or DGA-based botnets. This work is not comprehensive to detect all bot-infected machines. It is only effective for detecting domain-flux or DGA-based bot infected machines inside the monitored network. The results of this research motivate us to consider a new method to detect botnet C&C servers. This research is a part in our next research works (in Chapter 4).The second problem is how to detect C&C servers of domain-flux or DGA-based botnets. Several previous approaches [1-4] have concerned about this threat and their strategies have brought the useful results. Yadav et al. [1] presented a technique to detect C&C domains of DGA-based botnets by looking at the distribution of unigrams and bigrams in all domain names. However, the unigrams-and bigrams-based technique may not suffice, especially to detect domains generated by Kraken, Bobax or Murofet botnets due to the distributions of unigram and bigrams in all domains of these botnets are not significant difference compared to those of benign domains. To overcome this limitation, our works aim to improve and expand from the works of Yadav et al. [1]. We calculate frequency of occurrence of n-grams (n=3,4,5) in benign domain names and then assign score for each n-gram, respectively. To distinguish a domain generated by legitimate users or botnets, we present a method to measure the expected score of domain (ESOD) and combine with two other features aiming to feed into a classifier that we previously trained to classify bot-generated domain names from human-generated ones. We use five various machine learning algorithms to train classifiers and evaluate the detection effectiveness on each algorithm. The experimental results show that the decision tree algorithm (J48) is the best classifier can be used to detect botnet more efficient than other algorithms. The evidence from the experimental results has demonstrated that our proposed approach can be used to detect botnet in the monitored network efficiently. The details of the method will be presented in Chapter 4 of the dissertation.The final problem is how to detect malicious fast-flux service networks use feature-based machine learning classification techniques. There are some approaches have been developed to detect FFSN [5-8]. Since the characteristics of FFSN is one or more domain names that are resolved to multiple (hundreds or even thousands) different IP addresses with short time-to-live, and the rapid (fast) change in DNS answers. Therefore, classification process needs to rely on data gathered by completely unpredictable timing of queries sent by various users. The approaches that are proposed by [5-8] use a small amount of active DNS traffic traces, so it cannot obtain as many as possible resolved IP addresses of malicious fast-flux networks. This disadvantage may enhance false positive and false negative rates. However, this limitation may overcome if passive DNS replication method is installed. In this study, we build a PassiveDNS tool to sniff traffic from an interface or read a pcap-file and outputs the DNS-server answers to a log file (DNSlog). This is a technique to reconstruct a partial view of the data available in the Domain Name System into a central database, where it can be indexed and queried. The DNSlog databases are extremely useful for a variety of purposes, it can answer questions that are difficult or impossible to answer with the standard DNS protocol, such as where did this domain name point to in the past? What domain names are hosted by a given name-server? What domain names point into a given IP network? What subdomains exist below a certain domain name? We also define a DNSlog data aggregate aim to facilitate tracking and management of the query/response information related to each domain. Moreover, Holz et al. [7] focus on just three features derived from active DNS queries (i.e., the number of DNS "A" records, the number of DNS "NS" records, and the number of distinct Autonomous Systems (AS)). Passerini et al. [8] employ 9 different features, while we use 16 key features to train classifiers. Among the 16 introduced features, there are 12 features are first proposed in this dissertation. The advantage of our approach is that it is able to detect a wide range of fast flux domains including malware domains with a significant detection effect. The experimental results show that our method produces a lower false positive rate (FPR) (0.13%) compared to FPR of 6.17% produce by [7] and 4.08% produce by [8]. The details of the method will be presented in Chapter 5 of this dissertation.
Keywords/Search Tags:HTTP botnet, C&C Server, Domain Generation Algorithm (DGA), Domain-Flux, Fast-Flux
PDF Full Text Request
Related items