Font Size: a A A

The Study Of XML Documents Clustering Based On The Semantic Tag Tree

Posted on:2012-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:H M TengFull Text:PDF
GTID:2178330335957074Subject:Information Science
Abstract/Summary:PDF Full Text Request
Since it was released at 1998, XML gradually became a standard for data representation and data exchange with the advantage of uncomplexity, self-description, extensibility and open.The XML data is flooding on the web.At present, XML data mining increasingly became a popular research issue.Based on the introduction of XML technology and the cluster algorithm for XML documents, the paper review the study on the XML documents similarity computation, these methods of measuring the similarity of documents at present only make use of comparing the string, and don't consider the semantic information. In view of these cases, the paper proposes a new method for measuring the similarity, which is based on the semantic tag tree. The method computes the similarity with the structure and semantic information on the basis of path. Firstly, the method makes use of word sense disambiguation which is based on the WordNet to disambiguate the common tags in the documents, then, computes the semantic relatedness of the different tags, measure the document similarity with the same tags and the semantic relatedness of different tags. At last, the paper make the experiment of the documents clustering on the real data sets, which approve that the method is an effective method for XML documents clustering.
Keywords/Search Tags:WordNet, semantic similarity, XML, cluster
PDF Full Text Request
Related items