Font Size: a A A

Research On Some Key Issues Of Synopsis & Summary And Data Streams Analyzing

Posted on:2007-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L WangFull Text:PDF
GTID:1118360212465610Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
iiiAs an integrated technique, data streams managing and analyzing are growing into research hotspot recently. Because it has the advantage of representing originally features of data directly, processing heterogeneous data, capturing information rapidly, and responding users query in good time etc., data streams management and analyzing has come to high attention by many applications such as data processing in sensor network, negotiable securities managing, flux in internet monitoring, web usage log or call record online analyzing etc. data streams are series of continuous, changing with time, ordered items. Query executed on data streams, called as running long time continuous persistent query, generally return query results while new items stream into system in incremental way. Managing and analyzing these continuous data streams brings unique opportunities, but also new challenges.With the development of some techniques such as sensor, communication and pervasive computing etc., many industries today have more than very large databases; they obtain data streams that grow without limit at a rate of several million records per day. For Some typical application, for example power industry, the request for rapid processing and precise analyzing has become higher and higher. Accurate and fast analyzing, efficient and adaptive data streams management techniques is import breakthrough to realize practicality and industrialization.Typical managing and analyzing industry data streams include acquiring and preprocessing original data, abstracting characters of streaming data, operation of basic continuous queries such as select, join, aggregation etc., complicated analyzing such as detecting correlation and classifying. This thesis study above-mentioned issues deeply, we propose an efficient de-noising and repairing algorithm to improve the precision of all queries processing; we propose a few synopsis (& summary) generating algorithm for time series data streams and multi-dimension data streams to improve the efficiency and precision of approximate queries executing; we put forward a exploring correlation algorithms based on low-rank approximate of matrix theory to detecting correlations between multi-dimension data streams; we design and realize a forecasting model to forecast stream-values for time series data streams; finally, we design an online classifiers which can provide adapt to concept drifting on data streams to classify streaming data efficiently.The main contributions of this dissertation include the following:(1) We discuss the problems on outliers detecting and outliers repairing in data streams environment. A online detecting method for outliers over data streams, called AKF(Amnesia Kalman Filtering), is proposed. In order to identify outlier, it applies improved kalman filtering with the amnesia factor to forecast data-value at the future timestamp. And then a novel online adaptive repairing method for outliers over data streams, called AdaptiveIW(Adaptive Interpolating Wavelet), is proposed. The AdaptiveIW applies a variable-resolution interpolating method, named the interpolating wavelet with the adaptive resolution, to repair outliers, which determines interpolating resolution based on the number of continuous outliers. It adapts to the different requested precision for outliers repairing over evolving data streams very well. Experiment results on actual power load data prove that this method can provide well-precise instantaneous detection and accurate repairing for outliers over data streams.(2) In aspect of extracting characteristic from data streams, we studied several synopsis & summary generating method include sampling, histogram, wavelet etc. we improved a few synopsis & summary generating algorithm focusing on the limitation of this algorithm in certain performance. We proposed a reference framework of data streams managing and analyzing system based on diversified synopsis & summary. As the realization of this system framework, we designed a novel parallel data streams...
Keywords/Search Tags:data streams, interpolating wavelet, kalman filtering, outliers detecting and repairing, synopsis & summary, canonical correlation analysis, low-rank approximation, non-equal probability sampling, forecasting, adaptive precision, classification
PDF Full Text Request
Related items