Research On Approximate XML Joins

Posted on:2013-08-07

Degree:Master

Type:Thesis

Country:China

Candidate:F He

Full Text:PDF

GTID:2268330392467826

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the network, more and more information appeared inthe Internet. XML (eXtensible Markup Language) is the most popular dataexchanging and data storing tool in the network. XML documents from differentsources may represent the same or similar information, and cause a large number ofredundant. Integration of the same or similar information is meaningful, becauseusers can remove redundant information from integrated XML documents to and getmore complete and useful information.This paper introduces several XML similarity measures, and presents a newXML similarity measure based on XML subtree matching. In the XML subtreesimilarity measure, this paper not only considers the PCDATA value of the subtree’sleaf nodes, but also considers path similarity of the matching leaf nodes. Thedefinition of the subtree similarity in this paper is based on text and path similarity.Based on subtree similarity, this paper proposes the XML similarity measurealgorithm and XML similarity join algorithm. The Experimental results show that thesubtree similarity calculation can help the XML documents join.Most XML clustering algorithms are based on tree edit distance, and compareeach pair of the XML documents. With the increase of the number of XMLdocuments, clustering time will increase dramatically. This paper adds semanticinformation to XML hierarchical structure. According to the hierarchical structure ofXML, this paper proposed a new XML document similarity measure. By makingsome changes, CLOPE incremental XML clustering method can be used in XMLdocuments clustering, and without comparing each pair of documents. Experimentsprove that the incremental XML clustering method avoids comparing each pair of theXML documents, and greatly improve the efficiency of XML clustering.

Keywords/Search Tags:

Subtree matching, XML join, Similarity measure, Cluster Analysis

PDF Full Text Request

Related items

1	Research On Complex Distance Measure Based MapReduce Similarity Join Techniques
2	Similarity Measures In Cluster Analysis And Its Applications
3	Research And Implementation Of High Efficiency Set Similarity Join Algorithm Based On Overlap Similarity
4	Research On Improvement Of Similarity Join In MapReduce
5	Cluster Analysis And Its Application On Image Processing
6	Research On Self-Similarity Join In Heterogeneous Networks
7	Aggregate Queries On Constrained Probabilistic Similarity Join Pairs
8	Research And Implementation Of Similarity Join For Big Data
9	A Connection And Combination Based Research For Subtree Mining
10	Optimizing Top-k String Similarity Join Algorithm