| In recent years, there have been many theoretical studies on the management of probabilistic XML data. But for the evaluation of twig queries, there is still a lack of efficient algorithms. Therefore, this thesis studies twig query evaluation algorithms on probabilistic XML.First, this thesis proposes a new encoding scheme, pDewey encoding scheme, for probabilistic XML documents, to encode the tags and probabilities of ordinary and distributional nodes in documents. pDewey codes can be used to efficiently compute: The codes, tags and probabilities of the ancestors of any node; The codes, tags and probabilities of the common ancestors of any two nodes. These properties are crucial for matching twig patterns and calculating the probabilities of matches.Next, this thesis proposes a novel streaming scheme, Tag+Probability streaming scheme, for probabilistic XML documents. It partition the set of encoded nodes into several streams based on both the tags and the probabilities of the nodes, such that the nodes in the same stream have the same tag and similar probabilities. It can be used to prune the input data in the process of the evaluation of queries, which saves the I/O cost and remarkably improves the performance.Finally, based on the proposed encoding scheme and streaming scheme, this thesis proposes a new algorithm, pTJFastTP, for twig query evaluation on probabilistc XML documents. As a kind of holistic twig join algorithm, it can match twig patterns against probabilistic XML documents efficiently. In the join phase, it uses pDewey codes to calculate the probabilities of the matches, and prunes the intermediate results according to the probability threshold, which improves the performance furtherIn this thesis, extensive experiments are conducted to evaluate the performance of pTJFastTP. In the experiments, data and twig queries of various structures are used to compare pTJFastTP with ProTwig, which is the state-of-the-art algorithm for twig query evaluation on probabilistic XML. All the results show that, pTJFastTP remarkably outperforms ProTwig in I/O and CPU cost, and has much better data scalability and query scalability. |