Font Size: a A A

Multi-dimensional Probabilistic Regression Over Imprecise Data Streams

Posted on:2024-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:R GaoFull Text:PDF
GTID:2568306932462104Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Web of Things(WoT)uses standard Web protocols to enable data integration across systems and applications,architecting the underlying layers with multiple devices of different types,sources and operating protocols,including GPS,sensors,etc.Massive amounts of streaming data are continuously generated and captured from these device sources.These stream data are inherently inaccurate,such as the measurement confidence of the sensors themselves and errors in transmission.Considering the multiple possibilities to be considered when analyzing uncertain data,this makes the analysis overhead of IoT data surge.At the same time,due to the huge scale of data stream data,it cannot be stored locally for repeated readings and needs to be processed in real time.It becomes a challenging task to process such uncertain stream data in real time while considering its multiple possibilities.Therefore,this dissertation manages uncertain data streams by probabilistic modeling,and proposes probabilistic data streams,i.e.,a set of probabilistic data objects that flow in time,each of which has several probabilistic instances and existence probabilities.Meanwhile,this dissertation proposes a multidimensional probabilistic regression analysis for probabilistic data streams to analyze the trend of data streams.The main research contents of this dissertation are as follows:(1)Probabilistic Regression Processing Framework.This dissertation presents a probabilistic regression processing framework that integrates a series of techniques studied in this dissertation.Based on the tuple uncertainty model,this dissertation models imprecise data and obtains a probabilistic data stream.Furthermore,we extends traditional regression analysis and introduces probabilistic regression along with its corresponding definitions.Additionally,this dissertation investigates the data stream cube to manage the multi-dimensional attributes of IoT data streams.Therefore,the research findings of this dissertation provide an effective solution for trend analysis of uncertain and dynamic streaming data.(2)Online Processingwe Techniques.In order to perform real-time monitoring and analysis of probabilistic data streams,this dissertation proposes convolution-based and sketch-based probabilistic regression techniques.Considering the arrival time differences of data stream objects,two regression operations,namely internal regression and external regression,are introduced.Based on these techniques,an online materialization algorithm is proposed in this dissertation.It enables the real-time materialization of data cuboids as data objects arrive,thus facilitating the real-time processing of data streams.(3)Probabilistic Anomaly Query.This dissertation investigates the probabilistic anomaly query techniques for preference trend analysis,aiming to facilitate trend monitoring.Building upon this research,the dissertation further proposes a sketch-based pruning algorithm that selectively filters deterministic data cells to enhance query efficiency.(4)Batch Processing Techniques.This dissertation explores batch processing techniques,including probabilistic aggregation and batch materialization.Based on the types of dimensions,the dissertation introduces time aggregation and non-time aggregation.Furthermore,building upon these techniques,a semi-distributed batch materialization algorithm is proposed.This algorithm leverages the probabilistic regression values of ancestor data cells to reduce computational costs and supports more flexible materialization of data stream cubes.
Keywords/Search Tags:probabilistic regression, data stream processing, data warehouse, probabilistic aggregation, probabilistic anomaly query
PDF Full Text Request
Related items