Font Size: a A A

Search Of Group Intelligent Text Clustering Methods Based On Semantic Similarity

Posted on:2013-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:H TaoFull Text:PDF
GTID:2248330362972015Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, word is in an era of information explosion. Users are often overwhelmedby information when they searching info, which reduces the efficiency of search greatly.How fast and efficient is the classification and organization of the information, and how toprovide accurate and useful information for users is a problem which is urgent to be solved.Under this background, the text mining technology is getting more and more attention. Textclustering is an important component of text mining and it is the application of clusteringmethod used in text processing field.Text clustering can complete grouping the text without the information of class. Basedon this advantage, text clustering has been used widely, such as multi-documentsummarization systems, search engines, digital library and so on. At present most of theclustering algorithms are based on the vector space model, which makes the text clusteringfacing some common problems, such as high dimensional, high sparse and ignoring thesemantic information. These problems affect the performance and the accuracy of thealgorithm.This paper introduces some concepts and methods of text clustering, includingcalculation of the distance between the text, the text representation model, textpreprocessing, clustering results evaluation and commonly used clustering algorithms; thenpresents the HOWNET organizational structure, related concepts and calculation ofsemantic similarity, an improved method of calculating the similarity between the text, andits combination of K-means algorithm, through the experimental data to prove thecorrectness of the method; finally introduces two kinds of swarm intelligence algorithms,and proposes the hybrid intelligent algorithm based on the semantic similarity between thetext.Feature extraction in the text pre-processing stage to calculate the weights, not onlytaking term frequency and document frequency into account, but also combined with theword part of speech and word location in the text. For the vector space model ignoring thewords of semantic information, the paper uses HOWNET, by semantic information of word,to calculate the similarity of the text. After study the result of predecessors’ achievements,proposing the algorithm in this dissertation. It merges K-means algorithm, ant colonyalgorithm and simulated annealing algorithm to study the issue of text clustering, using theirrespective advantages and avoid their shortcomings. By experimental data, we can see theeffectiveness of the algorithm.
Keywords/Search Tags:Text Clustering, Semantic Similarity, K-Means Algorithm, Ant ColonyAlgorithm, Simulated Annealing Algorithm
PDF Full Text Request
Related items