Font Size: a A A

The Research On A Few Key Issues In Querying Algorithms Over Data Streams

Posted on:2009-12-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:W M LiFull Text:PDF
GTID:1118360242972706Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The data stream model has recently been paid attention to for its applicability to numerous types of data,including network monitoring,sensor network,financial services,etc.Compared with the applicability of standard relational database technology,data stream is a real-time,continuous,ordered sequence of items,so it isn't feasible to control the order of them,nor is it possible to locally store a stream wholly.Therefore,the online analyzing algorithm should be fast in the limited system resource.We have studied a few problems over data streams as follows:1,Effective similarity search over data stream is of importance for applications as above.In this dissertation,we propose similarity search over data stream based on Linear Predictive Coding(LPC)cepstral coefficients using Dynamic Time Warping(DTW).We aim to solve two key problems in similarity search: to capture the important information or energy of the data stream with fewer coefficients,effective distance metric.LPC is a tool using the information of a linear predictive model.Compared with traditional approaches,such as similarity search based on DFT(Discrete Fourier Transform)and DWT (Discrete Wavelet Transform),the proposed method can use fewer coefficients to capture desired features from data stream for similarity search. In distance metric of similarity search,DTW that replaces the Euclidean distance metric could bring better performance.The relative experiment results demonstrate the proposed method is better than the traditional approaches.2,We proposed a novel method to deal with and describe the data stream: B-clipped.This method clipped data through two phrases:dimensionality reduction using piecewise aggregate approximation(PAA)within measure and B-clipped process that clipped the real valued series through bisecting the value field.It contains three phrases,namely,dimensionality reduction through piecewise aggregate approximation(PAA),Bi-clipped process that clipped the real valued series through bisecting the value field.Through related experiments,we demonstrate that B-Clipped method gains higher quality solutions in less time compared with M-clipped method that clipped the real value series through the mean of them,and unclipped methods.This situation is especially distinct when streaming time series contain outliers.3,AR~* models contain Autoregressive moving average(ARMA)and Generalized autoregressive conditional heteroscedastic(GARCH)classical model.They are popular time series models used in time series.Recent researches in forecasting with generalized regression neural network(GRNN) have suggested that GRNN can be a promising alternative to the linear and nonlinear time series models.In this dissertation,a model combined the AR* and GRNN is proposed to make use of the advantages of both models in linear and nonlinear modeling.In the AR~*-GRNN model,AR~* modeling aids in improving the combined model's forecasting performance by capturing statistical and volatility information from the time series.The relative tests testify that the combined model can be an effective way to improve forecasting performance achieved by either of the models used separately.4,We developed a new load shedding scheme based on AR~*-GRNN.When bursty data stream arrives,the available capacity may not supply enough resources to deal with them.However,overload situations are usually unpredictable.In this case,a better load shedding plan should be studied to solve unexpected resource bottlenecks and performance degradation in the system.We introduce Linear Predictive Coding(LPC)technology to obtain feature values using fewer coefficients.AR~*-GRNN model is used to forecast the feature values of which the data stream are shedding,and make a better load shedding plan using QoS information to fulfill the analyzing data stream in over-load situation.Experimental results indicate that the load shedding scheme based on AR~*-GRNN is an effective method in improving query quality when the system is under overload situation.In this dissertation,we studied four problems over data streams and proposed new methods for them.Through a series of experiments compared with previous work,we conclude that these proposed methods outperform them and are effective improvements on current processing methods over data streams.
Keywords/Search Tags:data steam, similarity search, LPC-DTW, B-clipped, AR~*-GRNN, load shedding
PDF Full Text Request
Related items