Research And Implementation Of The Text Cluster Based On Text Similarity Caculation

Posted on:2011-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:Q Geng

Full Text:PDF

GTID:2178330332460376

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text clustering is an important technology in the domain of Data Mining to improve the efficiency of text data mining and knowledge retrieval. In reality, the government departments make some preplans based on the viewpoints after scanning mass text-based information. Our time and energy is exhaustible, while the information is inexhaustible. A technology is needed to categorize the information quickly to increase the working efficiency. This paper devises a framework for the text clustering system, in which the design and realization of each module are analyzed in detail. And in this paper, I will discusse the design and achievement of each subsystem in detail. The major completed works (points) are as follows:1. A Key Word Concept List is designed as a text representation model in order to solve the problem in Vector Space Model which is high dimensionality of the features space and the inherent data sparsely. It is widely believed that locating the proper key words is equivalent to grabbing the gist of a paper, and semi-structured texts can be translated into structured models through some rules to compute the weight of these keywords.2. Based on the text sets to be dealt with, a semantic corpus is built as semantic support, a weighted calculating method for text similarity is adopted with the text representation model as indexes, key sentence flocks abstracted from original texts and such semantic analyzing methods as surface similarity, semantic similarity and the influence was caused by the word order etc.3. Built on DBSCAN, a brand-new text-clustering calculating method is designed, which adjusts the prerequisite of text clustering, so that the relation intensity among the data objects within the same cluster.At last, according to the above mentioned, a text clustering system is established, the conclusions and usability of this paper are analyzed through experiments.

Keywords/Search Tags:

Data Mining, Text Similarity, Text Clustering

PDF Full Text Request

Related items

1	Key Techniques Of Text Ming On Criminal Cases
2	Research On Key Problems In Text Mining Based On Cloud Method
3	Study On Similarity-based Text Clustering Algorithm And Its Application
4	Chinese Text Clustering Based On Text Similarity
5	Research And Implementation Of Text Mining Technology Based On Public Security Information
6	Research On Web Text Mining
7	Research On Text Clustering Methods And Their Applications
8	The Study And Application Of Web Text Data Mining Technology Based On The Approximate Pages Clustering Algorithm
9	Research On Key Problems About Large-Scale Text Clustering
10	Research On Short Text Clustering Techniques And The Applications On Emails