Font Size: a A A

XML Query And Normalization Research Based On Semantics

Posted on:2011-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D LinFull Text:PDF
GTID:1118330332475565Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
ABSTRACT:XML has been more widely used in many fields, the development of the sophisticated XML-based data management system, is becoming the goals of many researchers. The establishments of the query methods and the normalization mechanisms which are adapted for the characteristics of XML data are the foundations making the XML-based data management system practical. Therefore, we choose the XML Query and XML normalization as the two major research directions of the paper. XML query are researched along two directions:structured query and keyword query. Structured query stresses the path matching between query use cases and XML nodes, which make it impossible for users to provide query cases if they do not know the structures of the XML data. Simultaneously, the query cases are vulnerable to the impact of structural changes. The keyword query methods only have the capabilities of matching keywords and the labels of XML nodes, and cannot assure the accuracies of the query results.XML normalization research mainly uses path information of XML nodes as the research foundations, which brings the expressive deficiencies such as lengthy expression, confused meaning and being vulnerable to the impact of structural changes, on the other hand it also brings semantic deficiencies such as including irrelevant nodes, constraint missing or constraint redundancy. Therefore, we can not rely on them to describe and analyze the complex XML data relationships completely.In response to above shortcomings, the main improvement of our research is fully using the semantic information of XML nodes and users'commonsensible information, and the remarkable harvests have been gained in improving the practicability, the effectiveness and the accuracy of XML query methods, expressing XML data dependencies and eliminating XML data redundancies.In detail, we mainly carry out the following work:(1) The concept of entity segments is introduced into XML data model, and XML nodes are correlated to entity segments, which makes XML data model have the capability to reflect XML node semantics and forms the bases for semantic-based XML query and XML normalization research.The data storage characteristics of XML documents are re-examined and the concept of entity segments is proposed based on XML node categorization. XML documents are considered as not only hierarchical structures of independent nodes but also hierarchical structures of entity segments. Furthermore, XML nodes are correlated to entity segments, which make XML nodes have semantic features. That is why we conduct research based on semantics of XML queries and the foundation of XML normalization.(2) An XML query method based on keyword Grouping and Categorization is proposed. It can carry out semantic matching query under the condition that keyword semantics are provided by users, which improves XML query in practicability and accuracy prominently.Firstly, a new XML query language-grouping and categorization keyword expression is proposed. Using the language, users can give keywords the explicit semantics based on the commonsensible information without knowing about the XML structures and grasping complex grammars; Secondly, various operators are introduced into the grouping and categorization keyword expressions to strengthen their semantic expressivities; Thirdly, a new XML encoding way-C-Dewey encoding is proposed to identify the relationships between XML nodes and entity segments; Finally, FRQI query algorithm is proposed to realize the semantic matching between the keywords in query cases and the XML nodes. The experimental results show, under the premise of efficiency, FRQI algorithm can return the query results matching users'query intentions in a high degree.(3) A two-phase XML keyword query method is proposed to carry out the semantic matching query under the condition that no keyword semantics are provided by users, which improves the applicability of semantic matching XML query method.Firstly, Using the favorable condition that a large number of XML nodes correspond to the limited semantics, XML node semantics are formalized into semantic triples; Secondly, a novel XML index structure-node semantic index is established based on semantic triples, which make it possible to search XML nodes according to their semantics. Finally, TPKQ query algorithm is proposed to realize the semantic matching query. In the first phase of the algorithm, the query is executed in XML semantic set, and then, the query is expanded to the whole XML documents. The experimental results show, compared with the traditional keyword query algorithm, FRQI algorithm can return more accurate query results efficiently.(4) An XML normalization solution is proposed based on XML node semantics. The solution can improve the expressive mode of XML data dependencies and the effectiveness of XML normal forms in eliminating data redundancies. Firstly, the concept of XML attribute dependencies is proposed to express XML data dependent relationships, which make XML nodes be identified firstly in the way which can reflect their semantics. Secondly, based on the improvements of the expressive mode of XML data dependencies, we propose the new definitions of XML key and XML normal form. Finally, the effectiveness of the proposed XML normal form in eliminating data redundancies is proved by the "missing-restore" method.
Keywords/Search Tags:Extensible markup language (XML), Entity segment, Semantic matching, Keyword query, Data dependency, Normalization
PDF Full Text Request
Related items