Font Size: a A A

Research And Analysis Of Quantile Estimation Algorithm For Large-scale Network Data

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z LinFull Text:PDF
GTID:2348330542498257Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,processing and querying massive amounts of network data is meeting the challenge.We are no longer satisfied with the batch processing of offline data,but require the data stream to be processed in nearly real-time calculation and analysis.When analyzing the distribution characteristics of network data streams,it is most common to descriptions of the distribution of data in terms of the cumulative distribution,which is characterized by the quantiles of the data.As a common method in data management and analysis,although quantile estimation has attracted a great deal of research over the past half century,especially in the case where the data is described incrementally and we must compute the quantiles in an online,streaming model,the current quantile estimation algorithms still suffer from the following disadvantages when calculating quantiles of network traffic:(1)The network data usually show obvious heavy-tailed distribution,and the estimation errors of the fractional tail bits with sparse data distribution will be obviously increased.(2)The network data with low expansibility and huge capacity far exceeds the throughput of the traditional algorithm,Difficult to meet the real-time.In order to solve the problem of quantile estimation in large-scale skewed network data stream,first of all,in this thesis,we study and analyze the proposal,process,advantages and disadvantages of several traditional quantile estimation algorithms in detail.Second,we proposed and designed a novel Incremental Quantile estimation with Nonlinear-interpolation(referred to as the IQN algorithm below),which uses the Gaussian Mixture Model to fit the cumulative distribution function of the data stream so as to reduce the estimation error caused by the fractional digits in the tail interpolation.Then,the experimental results show that the IQN algorithm has higher estimation accuracy than the traditional quantile estimation algorithm for the data set with heavy tail distribution.Finally,we build a parallel IQN algorithm based on Spark in order to verify its scalability and accuracy,moreover designed and deployed a high-performance real-time network delay quantile estimation system based on the streaming computing systems such as Kafka and Spark Streaming in the actual network environment,validated based on IQN The algorithm calculates the reliability of quantiles in a large-scale network data stream.To sum up,the IQN algorithm proposed in this thesis is of considerable referential importance for the real-time calculation of the quantile series of network data stream.
Keywords/Search Tags:streaming algorithm, quantile estimation, parallel algorithm, distributed computing system
PDF Full Text Request
Related items