Font Size: a A A

An XML Based Technique For Information Query

Posted on:2007-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:P YinFull Text:PDF
GTID:2178360182498387Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
These years , we pay more and more attention to the XML technique and itsapplication. With more data is organized in the form of XML ,the researches of XMLbased Data-Mining become popular. In these studies ,how to extract the neededinformation from a XML data base is a very active direction. In this paper , wediscuss the article of XML based information query. The main works includes thefollowing several aspects:1.The query between user-described DT and the source document TRecently, most researches about this direction pay more attention to thesimiliarity of two given document T1 and T2. Less attention is given to the queryingmethod itself.In this paper ,we first discuss some algorithm of computing thesimiliarity between XML documents.Because their computiton flexibility is high ,weconsider the structure of the document to solve this problem. Then we propose thereverse-direction-route based query method (L-R which stands for Leaf--Root) .Wealso discuss how to compute topological match degree(Tmd) between DT and T .Theresult means :this algorithm is avlilable and easy to read., another advantage is :withdifferent users need ,the final results of Tmd can be scalable .2.The query method of user-described DT in XML data-base S[T]Actually in application ,the most likely case is : to extract the information similarto user-describe DT from a XML data-base S[T]. To the huge XML data-base ,it isimpossible to compare every document in S[T] with DT. It is more complex when theusers give out several needs or some users give out their needs in the same time. Thus ,it is necessary to do a pre-process on the XML data-base, check out those informationassociatied with DT. This is very useful when there are more than one user's needsDT.First, we discuss the pre-process of the XML data-base.The basic ideais ,combine the user-described DT in to the XML daba-base S[T] , so we get a newdocument set S'[T]. Make a clustering on S'[T] to extract the document whosestructure and content is similar with DT. Next ,we discuss the PBC clustering method,find out some of its disadvantage and improve it when we use it in the XML basedquery. Give out the definition of DTD-Mapping to solve the problem like "lost node"and "topological match" in the XML based query. Our main contribution is :make thealgorithm can work when the same structure is described in different style . The pathnode Ai and the "hierarchical" attribute in the DTD-Mapping can help us compute thetopological match degree(Tmd) and extract the needed information exactly from thesource document T.
Keywords/Search Tags:Information query, XML, Clustering, topological match degree, PBC
PDF Full Text Request
Related items