User Grouping Based On DNS Visit Records Mining

Posted on:2014-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:K Yang

Full Text:PDF

GTID:2248330395983813

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the booming of Internet, the amount of information is growing at a very high speed, andvast amounts of data are accumulated.Data Mining is the progress of discovering useful informationfrom a large number of data. Web Mining can be seen as Data Mining applied in Web area. DNSMining belongs to Web Mining, and is a kind of Web Log Mining. DNS visit records reflects thenetwork userâ€™s intention. Through DNS Mining we can find the network userâ€™s browsing pattern, inorder to meet the application of user grouping.At first, this paper makes an in-depth research of the classic Apriori association rule miningalgorithm, and improves its shortcomings. Then mining the DNS visit records with the improvedalgorithm, and found a series of fragmented user characteristics. As to meet the needs of furtherresearch, with the introduction of domain name classification mechanism, this paper proposes theconcept of the fingerprint of userâ€™s browsing pattern, and what it should contains are also analyzed.Secondly, this paper details the classic K-Means clustering algorithm, also with a detailed analysisof its principle and shortcomings, and describes the efforts and achievements of both domestic andforeign scholars have made. After that, take clustering effective function combined with randomrestart method to improve the K-Means algorithm in order clustering the fingerprints of usersâ€™browsing patterns for follow-up research.Then, this paper makes a deep study of the decision treeclassification techniques, and analyzes the overfitting problem, which is the most common andimportant issue. After a detailed analysis of its causes,"selecting and pruning" method is proposedto solve the overfitting problem and used in the improvement of the classic C4.5decision treealgorithm. Prove the effectiveness of this method through experiment, and apply it to the DNSmining.Finally, based on the above researches, a DNS mining program is designed, based on thedomain name classification mechanism it generates fingerprints of usersâ€™ browsing patterns, thenuses the improved K-Means algorithm clustering the fingerprints, then use the improved C4.5decision tree algorithm, which is improved by the "selecting and pruning" method, to complete theuser grouping function. Then an in-depth analysis of the test results is made and summary theresearch. At last, give views on the prospect of DNS mining.

Keywords/Search Tags:

DNS visit records mining, Association Rules, Browsing Pattern, Clustering, Decision Tree

PDF Full Text Request

Related items

1	Algorithm For Mining Association Rules Based On Clustering
2	Design About Association Rules Mining Based On Items Clustering And Transaction Tree
3	The Research On The Related Problems Of Association Rule Mining
4	The Study And Application Of Media Item Communications Analysis With Association Rules And Decision Tree Algorithm
5	Study On Association Rules Mining Algorithm Based On FP-tree
6	Research On Association Rules Mining Algorithm Based On Closed Pattern
7	Research And Optimization Of Association Rules Based On Can Tree
8	The Application And Research Of Decision Tree And Association Rules In The Analysis Of Drug Sale
9	Research On Key Algorithms Of Electronic Health Records Text Mining
10	The Log Pattern Cluster Mining Algorithm Based On Prefix Tree