Font Size: a A A

User Grouping Based On DNS Visit Records Mining

Posted on:2014-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2248330395983813Subject:Information security
Abstract/Summary:PDF Full Text Request
With the booming of Internet, the amount of information is growing at a very high speed, andvast amounts of data are accumulated.Data Mining is the progress of discovering useful informationfrom a large number of data. Web Mining can be seen as Data Mining applied in Web area. DNSMining belongs to Web Mining, and is a kind of Web Log Mining. DNS visit records reflects thenetwork user’s intention. Through DNS Mining we can find the network user’s browsing pattern, inorder to meet the application of user grouping.At first, this paper makes an in-depth research of the classic Apriori association rule miningalgorithm, and improves its shortcomings. Then mining the DNS visit records with the improvedalgorithm, and found a series of fragmented user characteristics. As to meet the needs of furtherresearch, with the introduction of domain name classification mechanism, this paper proposes theconcept of the fingerprint of user’s browsing pattern, and what it should contains are also analyzed.Secondly, this paper details the classic K-Means clustering algorithm, also with a detailed analysisof its principle and shortcomings, and describes the efforts and achievements of both domestic andforeign scholars have made. After that, take clustering effective function combined with randomrestart method to improve the K-Means algorithm in order clustering the fingerprints of users’browsing patterns for follow-up research.Then, this paper makes a deep study of the decision treeclassification techniques, and analyzes the overfitting problem, which is the most common andimportant issue. After a detailed analysis of its causes,"selecting and pruning" method is proposedto solve the overfitting problem and used in the improvement of the classic C4.5decision treealgorithm. Prove the effectiveness of this method through experiment, and apply it to the DNSmining.Finally, based on the above researches, a DNS mining program is designed, based on thedomain name classification mechanism it generates fingerprints of users’ browsing patterns, thenuses the improved K-Means algorithm clustering the fingerprints, then use the improved C4.5decision tree algorithm, which is improved by the "selecting and pruning" method, to complete theuser grouping function. Then an in-depth analysis of the test results is made and summary theresearch. At last, give views on the prospect of DNS mining.
Keywords/Search Tags:DNS visit records mining, Association Rules, Browsing Pattern, Clustering, Decision Tree
PDF Full Text Request
Related items