Research On Text Clustering Based On Semantic Similarity

Posted on:2008-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:S Sun

Full Text:PDF

GTID:2178360215997642

Subject:Computer applications

Abstract/Summary:

Text document clustering plays an important role in text mining and information retrieval systems. It can improve the result of queries; provide intuitive navigation and browsing mechanisms; and find similar texts.In text clustering applications, the text or document is always represented using Vector Space Model. This representation is very simple, but raises one severe problem: the high dimensionality of the features pace and the inherent data sparsely. In addition, this representation also can't solve text data's polysemy problem and synonym problem. All these problems interfere with classification or clustering learning processes greatly and make their performances be dramatically dropped.The main technologies to solve the problem are weight adjustment and dimensionality reduction, but these methods have their own defects. Weight adjustment doesn't solve those problems effectively, so it improves the quality of clustering a little. Although dimensionality reduction solves high dimensionality, it cost highly. Moreover, there are many clustering algorithm, but they don't settle high dimensionality and understandable description of the clusters. To solve the problems mentioned before, this text proposed a new method for text clustering based on semantic similarityâ€“ TCUSS (Text Clustering Using Semantic Similarity). This method represents text with concept list. This representation not only reduces the feature dimension, but also is convenient for calculating semantic similarity. TCUSS calculates the semantic similarity of each concept in two concept lists on WordNet. Semantic similarity solve the polysemy and synonymy problems, also reflects the content similarity between tow texts. TCUSS clusters texts based on graph analysis to be independent with the shape of clusters. The experiment result has shown that TCUSS improved the text clusters correctly.

Keywords/Search Tags:

text clustering, semantic similarity, text representation, clustering algorithm, semantic network

Related items

1	Study On Similarity-based Text Clustering Algorithm And Its Application
2	Research On Text Clustering Algorithm Based On Word Frequency And Semantic
3	Search Of Group Intelligent Text Clustering Methods Based On Semantic Similarity
4	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
5	Research On Thesis Text Clustering Based On Semantic Similarity
6	Study On The Chinese Text Clustering Algorithm Based On Semantic Similarity
7	Study Of Text Clustering Algorithm Based On Semantics
8	Research Of Web Text Clustering Based On Semantic
9	The Research On Chinese Sentential Semantic Model Parsing And Text Representation
10	Research On Text Clustering Algorithm Based On Semantic Similarity