Font Size: a A A

Research On Approximate Query In XML Documents Based On Attribute Units Extension

Posted on:2009-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:W B ZhangFull Text:PDF
GTID:2178360308977772Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
XML (Extensible Markup Language) has been increasingly used in Web applications and become a standard for data interchange over the Internet. A lot of new approaches have been proposed in XML queries. Nevertheless, traditional approaches have limitations. As they are limited to queries on the strict query constrains, usually unsatisfactory results are returned. These results can't reflect the user's intention on the approximate semantic constrains. Therefore, approximate queries have been used in XML queries. Imprecise queries on XML documents should consider element contents based on structural relationships between tagged elements. Existing content-based XML approximate queries, ultimately cluster XML elements or map the elements into similar semantic units, then create semantic models to make approximate queries. These approaches require new semantic models in practice. They may bring imprecise class partitions or semantic loss. The cost of execution of queries is high. So an approximate query approach without a semantic model created is needed.This thesis proposes a new approach called TwigAE algorithm for imprecise query on XML documents, it extracts leaf elements and attributes of elements from an XML document as attribute units. Following the order of the importance of attribute units, it extendes the native query constrains to new query constrains. Then execute the new query conditions on the XML document again. The whole process is divided into three parts.First, an effective algorithm to find the approximate functional dependences from the attribute units is proposed.According to the positions of the attribute units of the functional units, a closure algorithm is used to find the approximate candidate keys. The most supportive approximate candidate key is the approximate key, which partitions the attribute unit set into a deciding set and a dependent set.Second, according to the support of these approximate candidate keys, importance of each attribute unit can be calculated. Then an order of the attribute units apprears. With this order, the native query conditions can be extended to make approximate queries. The more important attribute units are expended later. The attribute units in the deciding set are expended last, whereas the attribute units in the dependent set are expended first.At last, the new query conditions are executed on the XML document again. The expending order makes the results satisfying the most important conditions appear first.After a series of experiments, a conclusion can be drawn that the TwigAE algorithm can gain much better results in approximate queries compared to the native queries. On the other hand, recall rate and ordering constancy have obtained a good efficiency.
Keywords/Search Tags:XML, structural joins, approximate key, attribute units, approximate query
PDF Full Text Request
Related items