Font Size: a A A

Analysis And Research Of User Access Behavior Based On DNS Log

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:J D WeiFull Text:PDF
GTID:2428330575994856Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Various universities in China have built their own campus network,which has realized education informationization and Internetization.The fast and convenient campus network provides a wealth of resources for college teachers and students,broadening the horizons of students.When users use the network to obtain information,they will generate a large amount of access data,and mining valuable information from massive user access data has become a research hotspot in recent years.Extracting valuable information from complex and complex logs and analyzing and modeling this information is the main content of this paper.This article uses the DNS log of the school information center as the data source.The main work is as follows:(1)Filter and clean the original log data.Eliminate duplicate and useless data to lay the foundation for future data analysis.(2)Research on the classification of user access domain names in the log.In the DNS log,the domain name accessed by the user is a very important field.By classifying the domain name,part of the network access characteristics of the user can be obtained.This paper uses the combination of the domain name classification library and the domain name classifier to classify the domain name.The establishment of the domain name classification library is obtained through the web crawler crawling domain name classification website.The domain name classifier is trained on a large number of domain names that have been classified in the subject category by machine learning algorithms.The main function is to classify domain names that are not matched in the domain name classification library.(3)Research and analysis of user network access feature clustering.The user behavior feature vector is obtained by tagging the domain name accessed by the user,and then the clustering analysis is performed on the data to obtain the access characteristics of different user groups.This paper analyzes the shortcomings of K-means clustering algorithm,and combines Canopy algorithm and K-means to cluster the user.Aiming at the large amount of data and the large number of dimensions of the subject data,this paper implements a distributed K-means clustering algorithm based on the Map Reduce programming framework.Experiments show that the algorithm can effectively cluster according to user characteristics.(4)Statistical analysis of user network behavior characteristics.This paper analyzes students'online behavior characteristics from various aspects,including user access activity in different time periods,user access domain name topic analysis,domain name traffic analysis,feature analysis of each user group,etc.,showing users in multiple dimensions.Access characteristics.Through the DNS log mining and analysis of the information center of Beijing Jiaotong University,this paper obtains the user's online behavior habits and access preferences,and finally obtains the students'network behavior characteristics,aiming to guide students to apply the network reasonably and provide better quality for the whole school.The network service provides a basis for campus administrators to grasp the student's network usage.
Keywords/Search Tags:DNS Log, Classification, Clustering, User Characteristics, K-means
PDF Full Text Request
Related items