Document Clustering Based On The Semantic Network Of Forestry Thesaurus

Posted on:2011-06-15

Degree:Master

Type:Thesis

Country:China

Candidate:L R Li

Full Text:PDF

GTID:2178360305464321

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the perspective of ontology semantic, this paper attempts to improve the measurement of document similarity, using the semantic knowledge of ontology. This work combines ontological semantics and document clustering, in order to improve the effect of document clustering. For this purpose, a thesaurus-based document clustering method is proposed, where the "thesaurus" is a kind of ontology.Firstly, features of documents are extracted and collected with the help of the thesaurus. As a result, the processed documents are represented by TF-IDF (Term Frequency-Inverse Document Frequency). Then the similarity between terms is calculated according to the semantic relations among the terms. After that, the similarity between documents is attained according to TF-IDF and term relationship. And finally, the documents are clustered with the K-means algorithm.In this paper, the key technologies related to document clustering are studied and discussed, including Vector Space Model, feature extraction and collection, calculation of term similarity, and calculation of document similarity.The experiments in this paper is designed and implemented based on the data of Chinese-English Forestry Thesaurus and Chinese Forestry Literature Database. The experiment result has been compared to that of the clustering method without using thesaurus.Results from the experiments show that the clustering method with using thesaurus is apparently improved comparing to the clustering method without using thesaurus.

Keywords/Search Tags:

Thesaurus, document clustering, similarity, forestry

PDF Full Text Request

Related items

1	Research On Semantic Similarity Computation And Applications
2	Knowledge Of The Semantics Of The Document Retrieval Method
3	Research On Domain Ontology Constructing And Retrival In Forestry Area
4	Document Clustering Method Based On WAF
5	Research Of XML Document Clustering
6	Effects of similarity metrics on document clustering
7	Research On Efficient Document Clustering Using Improvised Sub-Document Based Framework
8	Clustering Research Of XML Document
9	Research On Document Clustering Based On Semantic Similarity Of Hownet
10	Research On Automatic Construction Of Natural Language Thesaurus