Font Size: a A A

Systematic Representation And Query Processing Of Uncertain Data Based On Bayesian Network

Posted on:2015-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2208330467488677Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the era of informatization rapid expansion of huge amounts of data, data storage, management and analysis are confronted with many challenges and opportunities. There are a lot of uncertainty factors in data, and the uncertainties exist has brought much of the more difficult challenge in the storage, management, and analysis. However, adopting the method of data lineage can decrease the complexity of this kind of problem. Data lineage is based on the emergence and development to track the source of the uncertainty, and it is widely used in the uncertainty of data query optimization, integration, quality assurance, etc. Based on lineage expressions for data query results of probability calculation for probability reasoning queries, there are two major challenges. At the moment based on Bayesian network is a kind of better feasibility plan with respect to the challenges in data lineage. Bayesian network is in the field of artificial intelligence as an uncertainty knowledge representation and reasoning of classic tools, and widely used in data mining, medical, military, pattern recognition and the community detection, etc.Aiming at the challenges of uncertainty data management and analysis, this thesis was carried on data lineage to represent the uncertainty data presentation and analysis based on the theory of the Bayesian network. Therefore, the main work of this thesis includes uncertainty data lineage and query processing. At last, we summarize the general contributions as follows:●First, the formal representation of time-series process of data lineage. For more clearly, in this thesis, we use the queries on the database with probabilities to describe the process of data lineage based on connected with probabilities. Current methods were based onused Boolean expression to describe data lineage, while in this thesis, we extend the lineage extension representations with temporal characteristics.●Then, the transformation of lineage expressions and Bayesian network construction as the typical representative of probability graph model. We make use of the temporal characteristics of lineage expressions to construct the model in each time slice by a directed acyclic graph respectively, and then we construct the time-series relationships between adjacent time slices, called temporal multiple lineages graph model.●Third, query processing based on probability graph model. Based on the characteristics of probability reasoning in time-series multiple lineages graph model, we extend the probabilistic inference method for data lineage query processing.Finally, the thesis implements a prototype system to demonstrate the process of representation and processing data lineages over uncertain data.
Keywords/Search Tags:Uncertainty Data, Data Provenance, Probabilistic Graphical Model, Mutual-Level Bayesian Network, Query Processing
PDF Full Text Request
Related items