Clustering Research Of XML Document

Posted on:2016-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:L X Yin

Full Text:PDF

GTID:2308330461495590

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, XML has become the Internetâ€™s most popular language for data exchange and storage, how to extract valuable information from a large number of XML documents is currently one of the hotspots.In the study of XML document clustering methods, a study said the idea is to improve the model of an XML document, in order to get a more efficient method of calculating the similarity of XML documents. Currently there is an XML document for a variety of similarity calculation models, such as SET/BAG model, VSM model, tree models, which have a variety of similarity calculation method in each model. This article describes the basics of text clustering and its application, analysis of commonly used text clustering algorithm and its advantages and disadvantages, introduces some of the basic similarity calculation method for XML document similarity calculation model and the basic model, analyzes the advantages and disadvantages of various similarity calculation methods.This paper presents an improved method of similarity calculation SET/BAG model is based. This method converts each node of the XML document as an object (by the object name, the parent object, and the object is a collection of attributes with respect to the weight of the composition of the right of its parent), this can be a more complete expression of the structural information of the XML document, and by right to adjust the weight duplicate nodes to reduce their impact on the similarity calculation. This article on a real data set with manual data collection experiments, respectively, using the recall and precision of the clustering results are evaluated, similar to the method tree edit distance method compared with the node by comparison, simulation results show that the article similarity calculation method based on the following SET/ BAG improved model proposed clustering can get good results.

Keywords/Search Tags:

XML, Document Clustering, Similarity Computation

PDF Full Text Request

Related items

1	Research Of XML Document Clustering
2	Clustering Research Of XML Document
3	Document Clustering Method Based On WAF
4	Effects of similarity metrics on document clustering
5	Research On Efficient Document Clustering Using Improvised Sub-Document Based Framework
6	Research On Document Clustering Based On Semantic Similarity Of Hownet
7	The Research On Chinese Sentence Similarity Algorithm Based On HNC
8	The Research Of Enterprise Document Retrieval Model Based On Ontology
9	Research And Implementation Of Topic-based Document Data Collection System
10	Web Document Automatic Classification Based On Keywords