Uncertainty Bayesian Networks Based On Data Cleaning

Posted on:2014-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:X G Li

Full Text:PDF

GTID:2268330401954117

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, information data is sharply increasing. A great deal of uncertainty data resulting from the inaccuracy of raw data itself or adopting coarse-grained data collection, are ubiquitous in various fields such as economy, military, logistics, finance, telecommunications, and scientific computing. In the uncertainty relational database, traditional relational database approaches are not able to fully meet the needs of the uncertainty data processing. Data cleaning, being an important method to improve data quality and perfect the data query results, is attached more and more attention and importance.SPJ query processing in the uncertainty relational database of tuples with the probability dimension, the probability dimension in the query output results provides an important reference and basis for usersâ€™decision-making. This paper will aim at the requirements of real-time and accuracy for SPJ query processing in the uncertainty database, focus on the calculation of the probability in the query output tuples, use Bayesian network, which is an important tool for uncertainty knowledge representing and inference, combine with the specific characteristics of the uncertainty database query plans, to do research work on the data cleaning problems in the SPJ query processing.The main work of this thesis can be summarized as follows:Â· The construction of the Bayesian network is based on the uncertainty query plans. Using the idea of graph traversal, the paper will proceed with the characteristics of uncertainty query plan, and then construct the directed acyclic graph structure for the Bayesian networks. The conditional probability table of each node in the DAG will be calculated according to the causality dependencies in the tuples of the query plans, thereby the construction of the queries Bayesian network containing causality dependencies, which will be the basis for the subsequent probability cleaning, will be completed. Â· Probabilistic reasoning is the direct purpose of the QBN constructing. In order to output the query results real-timely and accurately, in this paper, considering the specific characteristics of the SPJ query, Gibbs Sampling is adopted to calculate the query output tuples probability value based on QBN approximate reasoning algorithm, so that we will provide users with a most possible correct answer and reference.Â· Based on the calculated probability value by QBN inference, we will define a method to compare the probability value in the query output result tuple with that in QBN reasoning, thus the probability dimensional data in the tuples can be cleaned.Â· Based on the program design, we implemented the QBN construction, reasoning and the corresponding cleaning methods Of probability value. And we tested the efficiency of the QBN construction, the stypticity of inference algorithm, and the accuracy of the data cleaning. The results show that the proposed method is feasible and efficient.

Keywords/Search Tags:

Uncertainty data, Data cleaning, Bayesian network, Approximate inference

PDF Full Text Request

Related items

1	The Research On Approximate Inference Algorithm For Dynamic Bayesian Networks
2	The Research On Inference Algorithm For Bayesian Networks Based On Sampling
3	Probabilistic Graphical Models Based On Data Cleaning
4	Research On Fast Approximate Algorithm Based Bayesian Inference
5	Based On Bayesian Network Cube Uncertainty Knowledge Representation And Reasoning Method
6	Facial Expression Recognition Based On Bayesian Inference
7	Study Of Data Cleaning Algorithms Based On Data Warehouse
8	Research On Multi-source Heterogeneous Large Data Cleaning Technology Based On Machine Learning
9	Research On Data Cleaning Based On Science And Technology Innovation Big Data Public Platform
10	The Research On The Key Problems Of The Approximate Bayesian Computational Inverse Problem With Uncertainty