Font Size: a A A

Research On Query Optimization And Correlative Technologies In XML Database

Posted on:2007-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W SunFull Text:PDF
GTID:1118360185466738Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, a large number of Web data emerges on the Internet. The Web data formats as XML documents. It becomes an important research topic of database, how to store effectively and process large XML documents, and how to retrieve information from them. The research work in this thesis revolved around query optimization techniques on XML database, focusing on the research of the XML query optimization technology based on schema and semantic information.For the characters of inaccurate schema, a method of schema mining of XML based on fuzzy decision trees is proposed. Based on analysis of the problems and defects of existing methods of schema mining, the concept of approximate schema is proposed. XML documents are expressed on a monadic Datalog program. Using an incremental clustering method, the approximate schema is constructed by clustering objects with similar incoming and outgoing edge patterns. The perfect schema of the classified objects based on fuzzy decision tree is obtained. It can overcome the defects of the schema mining including two patterns of excess and deficit.A method of discovering data dependency of XML based on rough sets is proposed. Data dependency is an important concept in database research, included of functional dependency and multivalued dependency. The notions of functional dependency and multivalued dependency in XML are given. The determinant theorems on XML functional dependency and XML multivalued dependency based on indiscernibility relation of rough sets are given. Based on these theorems, the algorithms of discovering data dependency are proposed.An algorithm of query optimization of regular path expression based on DTD is proposed. The concept of extended regular path expression is defined, and is used to reduce the DTD. The concept of entrance-node is defined. Based on the entrance-node notion, two kinds of regular path expression optimizing principles are proposed, named path shorten and path complementing. Using these two kinds...
Keywords/Search Tags:XML Database, Query Optimization, XML Algebra, Schema Mining, Data Dependency, Access Control Rule
PDF Full Text Request
Related items