Font Size: a A A

Study On Some Key Techniques Of Non-fully Structured XML Query Processing

Posted on:2007-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X G LiFull Text:PDF
GTID:1118360185977713Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of technique of Internet/Intranet, and the technique of heterogeneous information integration and storage, there are huge amounts of semi-structural data such as XML document emerging in the network. Due to the properties of self-description and flexible data structure, it is becoming one of the standards of data definition, storage and exchange. As a key technique of effective management of XML documents, non-fully structured XML query processing has been focused on by more and more researchers recently.Non-fully structural XML query (NFS) is a technique of querying XML documents lack of fully structural information. NFS query faces the situations that user doesn't know fully the structural knowledge of an XML document, or a document doesn't provide any structural information, or documents are heterogeneous. Under each situation, a user can't write a regular query to express his intention accurately. In practices, especially in Internet/Intranet, most of XML documents are lack of structural information or heterogeneous, so NFS query becomes more and more popular in recent years. This dissertation deeply studies two key techniques of non-fully structured XML query processing: the determination of meaningful query result and the content based result clustering.The determination of meaningful query result is a very important step for NFS query. Most of the determinations in previous works, such as Interconnection Relationship in XSEarch system and MLCA in Timber system, are proposed from a special view, so they are applied to some kinds of XML documents only. Moreover, they became infeasible for large scale documents, such as both the time of establishing the index in XSEarch and the time of querying in Timber is far beyond user's tolerance.This dissertation proposes a general determination model based on the concept of pattern and instance, called as PE model. The PE model is a system-oriented model and can be accepted widely by users. In fact, the PE model is just a scalable framework and independent of the definition of equivalent pattern and equivalent query term. Under the framework of the PE model, this dissertation proposes a structure similarity based method to compute equivalent pattern, and put forwards a determination rule. To improve the efficiency of NFS querying, this dissertation...
Keywords/Search Tags:XML Document databases, XML query, Non-fully structured XML query, Document clustering, Clustering skew, Feature reduction, Information theory
PDF Full Text Request
Related items