Font Size: a A A

Research On Update And Query Optimization In Probabilistic Relational Databases With Integrity Constraints

Posted on:2017-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:1318330485450828Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Modern applications, such as data cleaning, sensor networks, tracking moving object-s, emerge an increasing demand for managing uncertain data. As an important uncertain data management model, probabilistic relational databases receive wide attention attention from academia and industry since 2003. Informally, a probabilistic database is a probability distribution over a set of deterministic databases (namely, possible worlds). Integrity con-straints are important information of relational data. Therefore, there are three important research problems of probabilistic relational databases:How do we represent uncertain data with integrity constraints? How do we update uncertain data efficiently? How do we answer queries efficiently using our chosen representation?Most current studies on uncertain data model focus on representing correlations among data, without taking into account integrity constraints. To tackle this challenge, a proba-bilistic relational data model with integrity constraints is proposed. Integrity constraints can capture the correlations among data after updates. Therefore, the correlations among da-ta can be updated automatically with conditioning probabilistic relational databases, which guarantee that no violated possible worlds are included in probabilistic relational databases, and solve the problem of the state-of-the-art data models focusing on only the correlations among current data. The existing initialization methods that transform the possible worlds representation into our chosen representation, make the formulae of tuples very long. An efficient initiation method is proposed by providing an equation that can generate simplified formulae of tuples. On the analysis of regular pattern in formulae of possible worlds, the equation eliminates duplicate variables in formulae of tuples. The time cost of probability computation of formulae of tuples is reduced in subsequent queries. The experimental study shows that the proposed initiation method greatly simplifies the formulae of tuples without additional time cost. The subsequent queries benefit from the simplified formulae of tuples.The existing methods for conditioning are exponential over the number of variables in the probabilistic database for an arbitrary constraint. Due to the high time complexity of the existing conditioning method, a constraint-based conditioning method is proposed to tackle this challenge. The constraint-based method considers only the variables in the given constraint and applies variable-replaced mechanisms to update the formulae of tuples, while avoiding the replacement of every variable in the probabilistic database. The experimental study shows that the constraint-based method is more efficient than other methods described in the literature in every configurations.Functional dependency constraints are a common type of constraints in relational databases. Two optimization strategies are proposed to further optimize the constraint-based conditioning algorithm for functional dependency constraints. A pruning strategy that traverses the formula of each correlated tuple separately rather than a complex formula composed of formulae of all correlated tuples, reduces the time cost of obtaining the set of satisfactory assignments. Based on the pruning strategy, in order to minimize the number of generated variables, a variable-elimination strategy that combines the satisfactory assign-ments according to the same possible worlds, can benefit the subsequent query processing. The experimental study shows that the pruning strategy can further improve the efficiency of the constraint-based conditioning method; The variable-elimination strategy can signifi-cantly reduce the number of newly generated variables without additional time cost.Most existing optimization strategies for relational algebra queries in probabilistic relational databases focus on accelerating probability computation of lineage expressions of answering tuples. However, none of them take into account simplifying lineage expression during query processing. To this aim, an optimization method that makes use of integrity constraints to generate simplified lineage expressions for query results is proposed. The simplified lineage expressions for two algebra operations are generated under functional dependency constraints and referential constraints separately. Assumption queries in probabilistic relational databases have natural and important applications. To avoid unnecessary updates of probabilistic relational databases in existing general methods of as-sumption queries processing, an optimization method by computing conditional probability is proposed to handle assumption queries. The effectiveness of the optimization strate-gies for relational algebra queries and assumption queries is demonstrated in the experiment.
Keywords/Search Tags:Probabilistic relational databases, Possible worlds, Conditioning, Functional Dependency, Relational queries
PDF Full Text Request
Related items