Research On Text Clustering Based On Hownet

Posted on:2013-10-03

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2248330392955385

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

K-Means algorithm is a classical algorithm of data mining technology, and it has the advantage of brief form and low time and space cost. It is also used widely in text mining. The paper researches on the key technology and algorithm in text clustering and puts forward a new method of calculating the similarity of texts based on hownet and improves the K-Means algorithm.The main work of the paper is to explore the effect of three text similarity calculating methods on K-Means algorithm. Using the classical vector space model based text similarity calculating method, hownet based text similarity calculating method and position information involved text similarity calculating method, the paper completes K-Means algorithm. To define the hownet based text similarity calculating method, the paper put forward a new way of generating vector space. It use the words of one text to generate a vector for the text,thus, the dimension of the vector equals to the number of words in the only text but not the number of words in all the text set. In this method, the high dimension and sparsity is reduced. The paper also talks something about the relation between the space and Euclid space. To define the position information involved text similarity calculating method, The paper also put forward that the similarity of two words should be decided by the words meaning similarity and position similarity. The paper also explore the method that how to correct the similarity of two words.

Keywords/Search Tags:

text clustering, vector space model, hownet, textual similarity

PDF Full Text Request

Related items

1	Research And Implementation Of Text Similarity Computing Based On HowNet Sememe Space
2	Study On Similarity-based Text Clustering Algorithm And It's Application
3	Text Similarity Computing Theory And Applied Research
4	Research On English Text Clustering Method Based On Vector Space
5	Research And Implementation Of Chinese Text Clustering Algorithms
6	Evaluation Method Research Of Automatic Summarization Calculating The Similarity Of Text Based On HowNet
7	Research On Semantic Textual Similarity
8	Research On Document Clustering Based On Semantic Similarity Of Hownet
9	Text Classification Based On Word Vector And Topic Vector
10	Semantic Similarity Calculation Text Field Vector Space Model