The Study Of XML Documents Clustering Based On The Semantic Tag Tree

Posted on:2012-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:H M Teng

Full Text:PDF

GTID:2178330335957074

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

Since it was released at 1998, XML gradually became a standard for data representation and data exchange with the advantage of uncomplexity, self-description, extensibility and open.The XML data is flooding on the web.At present, XML data mining increasingly became a popular research issue.Based on the introduction of XML technology and the cluster algorithm for XML documents, the paper review the study on the XML documents similarity computation, these methods of measuring the similarity of documents at present only make use of comparing the string, and don't consider the semantic information. In view of these cases, the paper proposes a new method for measuring the similarity, which is based on the semantic tag tree. The method computes the similarity with the structure and semantic information on the basis of path. Firstly, the method makes use of word sense disambiguation which is based on the WordNet to disambiguate the common tags in the documents, then, computes the semantic relatedness of the different tags, measure the document similarity with the same tags and the semantic relatedness of different tags. At last, the paper make the experiment of the documents clustering on the real data sets, which approve that the method is an effective method for XML documents clustering.

Keywords/Search Tags:

WordNet, semantic similarity, XML, cluster

PDF Full Text Request

Related items

1	Research On Semantic Similarity Between Words And Between Short Texts Based On WordNet
2	Research And Application Of Wordnet-Based Semantic Similarity Measurement
3	The Research Of Semantic Similarity Between Short Text Based On WordNet
4	The Study Of XML Documents Clustering Based On The Semantic Tag Tree
5	Research Of English Sentence Similarity Measure Based On Wordnet
6	Multiple Semantic-based Similarity And Relatedness Measurements In WordNet
7	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods
8	Research On Sentence Semantic Similarity Based On WordNet In Automatic Question Answering System
9	Study On Concept Semantic Similarity Measure Based On Ontology
10	Research Of Multi-Documents Summarization Based On Information Extraction And Semantic Similarity