Font Size: a A A

Algorithm Research Based On Counting For Mining Frequent Items Over Network Traffic Measurement

Posted on:2015-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2298330422470732Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Heavy-hitters identification is a process of query frequent items in network trafficdata stream, in recent years has become a research hotspot in the field of data streammining. Network traffic has such characters as high speed, uncertain, time-varying,and Heavy-hitters identification accuracy of the query result is higher, the overhead oftime and space have some limitation of the algorithm. Therefore, the current datastream algorithm for mining frequent item remains low recognition efficiency andpoor real-time performance and performance needs to improvement.First, in view of the calculation formula and clear strategy of PLC (ProbabilisticLossy Counting) algorithm, further optimize the overhead of time and space, thispaper proposes a space optimization heavy-hitters identification algorithm DPLC(Delta Probabilistic Lossy Counting) based on counting. The algorithm introduced twoconcepts of change degree and steady-state value, measure to the window error valueof calculated whether to steady-state, after reaching steady state can save time, thusreducing the time overhead. In addition, at the delete border adopt stricter error value,distinguish to treat whether or not to use the calculated window error to replace thetable entry’s estimate error, can effectively reduce the storage costs.Secondly, in view of the LC (Lossy Counting) algorithm has more storageoverhead and low identification precision, and PLC algorithm to improve therecognition accuracy but possible under-reporting problems, further optimizationalgorithm performance, this paper proposes a nonlinear lossy counting crowdrecognition algorithm NLC (Nonlinear Lossy Counting). Non-linear function wereused as the statistical value, and delete values, through effective control algorithm ofstorage overhead to reduce the output of false positives and avoid omissions. It isworth noting that the parameter values of the nonlinear function whether selectappropriate will affect the NLC algorithm performance, through experimental analysissummarizes the nonlinear function between the values of the parameters andpackage distribution, support. Setting different parameter values of nonlinear function can change the NLC algorithm of storage overhead, and then change the identificationresults of false positives and false negatives.Finally, the synthetic data set and the real traffic data to assess the performance ofthe algorithm set by experiment. Experimental results demonstrate that the proposedalgorithm has obvious advantages in recognition accuracy, time overhead, and storageoverhead. In this paper, experiments were carried out under MyEclipse platform, usingJava language.
Keywords/Search Tags:data stream, frequent items, heavy-hitters identification, lossy counting, space optimization, nonlinear
PDF Full Text Request
Related items