Algorithm Research Based On Counting For Mining Frequent Items Over Network Traffic Measurement

Posted on:2015-05-22

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Li

Full Text:PDF

GTID:2298330422470732

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Heavy-hitters identification is a process of query frequent items in network trafficdata stream, in recent years has become a research hotspot in the field of data streammining. Network traffic has such characters as high speed, uncertain, time-varying,and Heavy-hitters identification accuracy of the query result is higher, the overhead oftime and space have some limitation of the algorithm. Therefore, the current datastream algorithm for mining frequent item remains low recognition efficiency andpoor real-time performance and performance needs to improvement.First, in view of the calculation formula and clear strategy of PLC (ProbabilisticLossy Counting) algorithm, further optimize the overhead of time and space, thispaper proposes a space optimization heavy-hitters identification algorithm DPLC(Delta Probabilistic Lossy Counting) based on counting. The algorithm introduced twoconcepts of change degree and steady-state value, measure to the window error valueof calculated whether to steady-state, after reaching steady state can save time, thusreducing the time overhead. In addition, at the delete border adopt stricter error value,distinguish to treat whether or not to use the calculated window error to replace thetable entry’s estimate error, can effectively reduce the storage costs.Secondly, in view of the LC (Lossy Counting) algorithm has more storageoverhead and low identification precision, and PLC algorithm to improve therecognition accuracy but possible under-reporting problems, further optimizationalgorithm performance, this paper proposes a nonlinear lossy counting crowdrecognition algorithm NLC (Nonlinear Lossy Counting). Non-linear function wereused as the statistical value, and delete values, through effective control algorithm ofstorage overhead to reduce the output of false positives and avoid omissions. It isworth noting that the parameter values of the nonlinear function whether selectappropriate will affect the NLC algorithm performance, through experimental analysissummarizes the nonlinear function between the values of the parameters andpackage distribution, support. Setting different parameter values of nonlinear function can change the NLC algorithm of storage overhead, and then change the identificationresults of false positives and false negatives.Finally, the synthetic data set and the real traffic data to assess the performance ofthe algorithm set by experiment. Experimental results demonstrate that the proposedalgorithm has obvious advantages in recognition accuracy, time overhead, and storageoverhead. In this paper, experiments were carried out under MyEclipse platform, usingJava language.

Keywords/Search Tags:

data stream, frequent items, heavy-hitters identification, lossy counting, space optimization, nonlinear

PDF Full Text Request

Related items

1	Counting-based Algorithms For Detecting Frequent Items In Network Traffic
2	Research On Frequent Items Problem Using Lower Bound In Massive Data Stream
3	Methods For Mining Frequent Items Over Data Stream Base On Time Windows
4	Research On Algorithms For Mining Frequent Patterns In Data Streams
5	Research On Count-based Algorithm For Mining Frequent Items Over Data Stream
6	Research On Frequent Items Mining And Clustering Algorithms Of Data Stream
7	A Method Based On The Vertical Division Of Data Stream Frequent Itemset Mining Algorithm
8	Research On Collaborative Monitoring Of Hierarchical Heavy Hitters Based On SDN
9	Research On Data Items Frequent Itemsets Mining Algorithm Based On Sliding Window
10	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams