Font Size: a A A

Research And Implementation Of The Text Cluster Based On Text Similarity Caculation

Posted on:2011-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q GengFull Text:PDF
GTID:2178330332460376Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text clustering is an important technology in the domain of Data Mining to improve the efficiency of text data mining and knowledge retrieval. In reality, the government departments make some preplans based on the viewpoints after scanning mass text-based information. Our time and energy is exhaustible, while the information is inexhaustible. A technology is needed to categorize the information quickly to increase the working efficiency. This paper devises a framework for the text clustering system, in which the design and realization of each module are analyzed in detail. And in this paper, I will discusse the design and achievement of each subsystem in detail. The major completed works (points) are as follows:1. A Key Word Concept List is designed as a text representation model in order to solve the problem in Vector Space Model which is high dimensionality of the features space and the inherent data sparsely. It is widely believed that locating the proper key words is equivalent to grabbing the gist of a paper, and semi-structured texts can be translated into structured models through some rules to compute the weight of these keywords.2. Based on the text sets to be dealt with, a semantic corpus is built as semantic support, a weighted calculating method for text similarity is adopted with the text representation model as indexes, key sentence flocks abstracted from original texts and such semantic analyzing methods as surface similarity, semantic similarity and the influence was caused by the word order etc.3. Built on DBSCAN, a brand-new text-clustering calculating method is designed, which adjusts the prerequisite of text clustering, so that the relation intensity among the data objects within the same cluster.At last, according to the above mentioned, a text clustering system is established, the conclusions and usability of this paper are analyzed through experiments.
Keywords/Search Tags:Data Mining, Text Similarity, Text Clustering
PDF Full Text Request
Related items