Font Size: a A A

Uncertainty Bayesian Networks Based On Data Cleaning

Posted on:2014-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:X G LiFull Text:PDF
GTID:2268330401954117Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, information data is sharply increasing. A great deal of uncertainty data resulting from the inaccuracy of raw data itself or adopting coarse-grained data collection, are ubiquitous in various fields such as economy, military, logistics, finance, telecommunications, and scientific computing. In the uncertainty relational database, traditional relational database approaches are not able to fully meet the needs of the uncertainty data processing. Data cleaning, being an important method to improve data quality and perfect the data query results, is attached more and more attention and importance.SPJ query processing in the uncertainty relational database of tuples with the probability dimension, the probability dimension in the query output results provides an important reference and basis for users’decision-making. This paper will aim at the requirements of real-time and accuracy for SPJ query processing in the uncertainty database, focus on the calculation of the probability in the query output tuples, use Bayesian network, which is an important tool for uncertainty knowledge representing and inference, combine with the specific characteristics of the uncertainty database query plans, to do research work on the data cleaning problems in the SPJ query processing.The main work of this thesis can be summarized as follows:· The construction of the Bayesian network is based on the uncertainty query plans. Using the idea of graph traversal, the paper will proceed with the characteristics of uncertainty query plan, and then construct the directed acyclic graph structure for the Bayesian networks. The conditional probability table of each node in the DAG will be calculated according to the causality dependencies in the tuples of the query plans, thereby the construction of the queries Bayesian network containing causality dependencies, which will be the basis for the subsequent probability cleaning, will be completed. · Probabilistic reasoning is the direct purpose of the QBN constructing. In order to output the query results real-timely and accurately, in this paper, considering the specific characteristics of the SPJ query, Gibbs Sampling is adopted to calculate the query output tuples probability value based on QBN approximate reasoning algorithm, so that we will provide users with a most possible correct answer and reference.· Based on the calculated probability value by QBN inference, we will define a method to compare the probability value in the query output result tuple with that in QBN reasoning, thus the probability dimensional data in the tuples can be cleaned.· Based on the program design, we implemented the QBN construction, reasoning and the corresponding cleaning methods Of probability value. And we tested the efficiency of the QBN construction, the stypticity of inference algorithm, and the accuracy of the data cleaning. The results show that the proposed method is feasible and efficient.
Keywords/Search Tags:Uncertainty data, Data cleaning, Bayesian network, Approximate inference
PDF Full Text Request
Related items