Font Size: a A A

Analysis Of Internet Access Pattern Based On The DNS Log

Posted on:2010-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:C JiFull Text:PDF
GTID:2518303065475954Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Domain Name System (DNS), which achieves the conversion between IP addresses and domain names, is the infrastructure of the Internet and the basis of other rich Internet applications. All IP-based Internet services use the domain name system to locate the corresponding resources. Therefore, DNS log recorded the domain names which are queried by users and contained a lot of information. It is a new way to analyze the Internet access pattern with the DNS log. However, the current study of DNS log is focused on the performance and configuration of the server itself because of the difficult access to DNS log and the enormous amount of log data. There has been relatively little research in the analysis of the behavior of the Internet user with DNS log.In this thesis, the Internet access pattern was studied with the DNS log of CN node within a number of days. These data was provided by the China Internet Network Information Center, which is responsible for the management of national domain name resources. Specific researches include: First of all, a data compression method was designed, which reduces the volume of data while retaining the valid data at the same time. Due to the huge amount of DNS log (about 200GB per day), reducing the amount of data before analysis is necessary.Secondly, the statistical rules were analyzed using the preprocessed data. And we found that the access amount of the domain names complied with the Zipf law. About 5% of the sites can meet more than 90% of Internet users'needs. The user's queries comply with Stretched Exponential Distribution, which is between power law and exponential distribution. The result reflects that it is combined with random and certainty when users choose DNS recursive server to query domain names.Finally, we focused on the clustering analysis of DNS data. First, a feature extraction method of IPs and domain names was designed. And then the data was clustered with K-means algorithm and BIRCH algorithm. The results show that the temporal behavior of IP addresses differs greatly. And it rendered three main kinds of pattern. Through the similar analysis of the .CN domain names, the domain names which truly reflect the need of the majority of users were found. The results of this research can enable the construction of a hierarchy of domain names and prioritized processing of DNS queries, leading to effective resolving and administration of CN domain names.
Keywords/Search Tags:DNS server, Log analysis, Access pattern, clustering, CN ccTLD
PDF Full Text Request
Related items