Search Of Group Intelligent Text Clustering Methods Based On Semantic Similarity

Posted on:2013-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:H Tao

Full Text:PDF

GTID:2248330362972015

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays, word is in an era of information explosion. Users are often overwhelmedby information when they searching info, which reduces the efficiency of search greatly.How fast and efficient is the classification and organization of the information, and how toprovide accurate and useful information for users is a problem which is urgent to be solved.Under this background, the text mining technology is getting more and more attention. Textclustering is an important component of text mining and it is the application of clusteringmethod used in text processing field.Text clustering can complete grouping the text without the information of class. Basedon this advantage, text clustering has been used widely, such as multi-documentsummarization systems, search engines, digital library and so on. At present most of theclustering algorithms are based on the vector space model, which makes the text clusteringfacing some common problems, such as high dimensional, high sparse and ignoring thesemantic information. These problems affect the performance and the accuracy of thealgorithm.This paper introduces some concepts and methods of text clustering, includingcalculation of the distance between the text, the text representation model, textpreprocessing, clustering results evaluation and commonly used clustering algorithms; thenpresents the HOWNET organizational structure, related concepts and calculation ofsemantic similarity, an improved method of calculating the similarity between the text, andits combination of K-means algorithm, through the experimental data to prove thecorrectness of the method; finally introduces two kinds of swarm intelligence algorithms,and proposes the hybrid intelligent algorithm based on the semantic similarity between thetext.Feature extraction in the text pre-processing stage to calculate the weights, not onlytaking term frequency and document frequency into account, but also combined with theword part of speech and word location in the text. For the vector space model ignoring thewords of semantic information, the paper uses HOWNET, by semantic information of word,to calculate the similarity of the text. After study the result of predecessorsâ€™ achievements,proposing the algorithm in this dissertation. It merges K-means algorithm, ant colonyalgorithm and simulated annealing algorithm to study the issue of text clustering, using theirrespective advantages and avoid their shortcomings. By experimental data, we can see theeffectiveness of the algorithm.

Keywords/Search Tags:

Text Clustering, Semantic Similarity, K-Means Algorithm, Ant ColonyAlgorithm, Simulated Annealing Algorithm

PDF Full Text Request

Related items

1	Study On The Chinese Text Clustering Algorithm Based On Semantic Similarity
2	Study On Similarity-based Text Clustering Algorithm And Its Application
3	Research Of Text Clustering And Classification Method Based On Genetic Annealing Algorighms
4	Similarity Calculation Of Three-dimensional Models Based On Genetic Annealing Algorithm
5	Research On Text Clustering Algorithm Based On Word Frequency And Semantic
6	Research On Text Clustering Based On Semantic Similarity
7	An Optimized Sequence Ib Text Clustering Algorithm
8	Soft-sensing Modeling Method Based On Clustering Algorithm Of Multi-operating System
9	Study On Similarity-based Text Clustering Algorithm And It's Application
10	The Research Of Router Nodes Placement Problem Based On Simulated Annealing--Genetic Algorithm In Wireless Mesh Networks