Font Size: a A A

Research On Approximate Computing And Quality Assurance Strategies In Large-scale Stream Data Processing

Posted on:2020-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y LiuFull Text:PDF
GTID:1368330602955532Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years,streaming computing that provides real-time processing capability has become a hotspot for large data research and application.Although distributed stream processing models with online processing demand has been widely applied to speed up the data processing,the exponential growth of data and the increasing demand for the real time have brought great challenge.Approximate computing can effectively alleviate the contradiction between high time-consuming and timeliness by sacrificing a little precision.Hence,it has the vital significance to research the approximate techniques for real-time stream data with the goal of improving processing efficiency,reducing resource consumption and meet the real-time requirement.However,approximate technologies can improve processing efficiency but will also reduce the output accuracy.The premise of effective approximate processing is that it can provide appropriate quality assessment and assurance,and uncontrollable quality loss will offset the gain of approximation.Therefore,the following problems need to be solved urgently: how to choose the approximate method feasible to different applications,how to assess the quality for the result of the approximation,and then set different degrees of approximation for the corresponding stages to reduce the accuracy loss as possible.With the above problems,the dissertation mainly researches approximate processing techniques and data quality in large-scale stream data applications.Taking the sampling approximate techniques as the core,the dissertation designs common or application-specific approximate processing methods and quality assurance strategies for stream data by considering multiple factors: data volume,processing capacity,and data quality,etc.The specific work and main contributions are as follows:1.From the view of data processing capacity,we propose online adaptive approximate processing and error control strategies for large stream data when considering the case that data volumes exceed computing capacities.To solve the problems of data cognition acquirement and output error controlling in real-time streaming data analysis,we propose a dynamic adaptive approximate data analysis framework.First,an online data learning strategy is designed for continuously arriving data.The strategy can automatically learn weight values for data strata and provide triggered update according the feedback information.Then,the sampling-based approximate algorithm is proposed taking the influence of real-time load changes on the sampling resource demand into account.Finally,we put forward a customized online error control strategy according to different error requirements specified by users.The strategy can detect the approximate output and timely rectify large errors.2.From the view of data sampling nodes,we solve the problems of approximate data collection and reconstruction for large-scale sensor data by considering how to optimize the deployment of sampling nodes.Combined with the specific underwater sensor network application,we propose an approximate data collection strategy based on the backbone network,which both considers the influence of approximation and frequent data loss underwater on the data quality.Then the belief propagation algorithm is used to infer the missing data that include uncollected data and missing ones caused by transmission.The algorithm can provide high-quality data recovery by taking into account multiple factors such as time,space and multivariate.To ensure the data quality meets the user requirement,finally we propose statistical based quality evaluation method which can assess and improve the inferred data.3.From the view of the importance of data resources,we solve the(near)real-time sensing data stream collection problem with the consideration of different data loss frequencies.Combined with the underwater sensor network scenario,we propose a low cost and high quality underwater approximate data collection method based on RNNs.First,the automatic retransmission in the transport protocol is dropped during data transmission,and transfers the processing of data loss caused by the operation to the datacenter.Faced with the high data loss emerging in UWSNs,an RNN-based data learning model is proposed to effectively tackle the data loss problem.The model considers missing features and variable correlations to impute and predict missing values for both space-or variable-related data.According to general and specific application scenarios,the dissertation designs efficient approximate processing strategies for data streams,and simultaneously considers the coupling relationship between resource scheduling and quality results.We deeply conduct studies on approximate collection methods at data sources,approximate analysis methods,and error analysis-control mechanisms.
Keywords/Search Tags:Stream data processing, approximate computing, aggreagate query, underwater wireless sensor network, data collection, data quality
PDF Full Text Request
Related items