Font Size: a A A

A XML Documents Cluster Method Based On Bitiized Depth Difference Sequences

Posted on:2015-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2268330428983195Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the great impact of modern life brought by the informationtechnology, Internet and devices import into the Internet creating vastamount of data every day, and the data request among the devices whichbelog to the Internet is also very large. As the reason of the format andstructure among different devices and different data sources are different,therefore there is a need for a unified standard format as a medium for dataexchange. XML is one of such tool that can be used.XML is the simplified of―extensible markup language‖, which is anincreasingly widely used standardized data storage and data exchangeprogram.The researches on XML document has made many achievementsboth in our nation abroad, but most of the research objects are on staticXML documents, ignoring the dynamic processes that exist between XMLdocuments. In fact, in practical applications, XML file as a data exchangetool is often generated changes frequently, and these frequently changingXML documents are developed from one or several of the original XMLdocument, these documents only part of the structure is changedfrequently. So for these dynamic XML document research, mining staticinformation and dynamic process that exists between them will be verymeaningful.In this paper, we first analyzed the general data mining methods of theXML document, as well as the current status of research for the XML document clustering. We focus on the cluster method on both static anddynamic XML documents. We propose a XML document structureinformation similarity evaluation method which is based on a bitlizeddepth difference sequence so called DepDS algorithm. This method is wellworked on static XML clustering problems, which is proved byexperiments. We also try to apply this method to cluster dynamic XMLdocuments. In order to calculate the dynamic XML documents’ structuresimilarity we propose a new method, which is based on DepDS algorithmwe call it REDS. We import the concept of Relative Entropy into dynamicXML document’s changes in different versions and then normalized thecurve of relative entropy values, choose the flat pieces of curves, and thenuse the approximate representation sets to characterize the represent thestructure of dynamic XML document structure, then this structure sets asthe series on behalf of an XML document, using the method of static XMLdocument clustering DepDS. Experimental results show that the algorithmcan achieve the desired effect of cluster.
Keywords/Search Tags:Depth difference sequence, Relative Entropy, XML document, DataMining
PDF Full Text Request
Related items