Font Size: a A A

Study On Twig Matching Algorithm Over Probabilistic XMLs

Posted on:2011-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2248330395458461Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, XML are widely adopted as data storing and data exchanging standards. Because of the complexity of external world, uncertain data is in some application fields. Usually the uncertain data can be represented as the probability values in XML document, which is called Probabilistic XML document. Probabilistic XML document has attracted more and more attention. Many algorithms have been created to query on ordinary XML document. But the research of querying the probabilistic XML document is not enough. Algorithms of p-TJFast and ProFirstTwig are proposed in this thesis.The algorithm of p-TJFast is the improvement of traditional algorithm of TJFast. and the algorithm of p-TJFast can query on probabilistic XML document. Unlike in ordinary XML document, data in probabilistic XML document are uncertain, and there are corresponding probabilities with them. For the purpose of supporting twig pattern matching against probabilistic XML document, we extend extended Dewey encoding scheme by adding Properties of the probability vision. As data in probabilistic XML document have corresponding probabilities with them, the results with low probabilities are abandoned, so there is Filtering operation in the algorithm of p-TJFast. The experiments show the algorithm of p-TJFast is effective in querying on probabilistic XML document.The algorithm of ProFirstTwig has the same encoding scheme as the algorithm of p-TJFast. The elements which have the same tag are sorted by the ascending lexicography order in most algorithms including the algorithm of p-TJFast, but the elements which have the same tag are sorted by their path probability values in the algorithm of ProFirstTwig. For tag stream sorted by path probability, we propose lower bound for every stream. The elements with probabilities lower than lower bound in the corresponding stream are not processed. This greatly reduced the number of elements to be processed. Similarly, there is Filtering operation in the algorithm of ProFirstTwig. The experiments show the algorithm of ProFirstTwig is efficient in querying on probabilistic XML document when Twig pattern is simple.
Keywords/Search Tags:Probabilistic XML document, p-TJFast, ProFirstTwig, match, tag stream
PDF Full Text Request
Related items