Researches On Semantic-Based Search In Text Clustering

Posted on:2012-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y Su

Full Text:PDF

GTID:2218330338470391

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Text clustering is an important branch of Data Mining. Today, with the increasing information, text clustering plays a significant role in our daily work and life. In terms of the text clustering, a lot of researches have been done accompanied by some initial achievements. But there are still many improvements in the current researches. Based on the condition of existing researches and comparative analysis, two key points summarized here to improve the text clustering. On the one hand, the clustering algorithm, as the core of text clustering, can directly determine the effectiveness and efficiency of clustering. However, currently, there is no clustering algorithm specifically applying in the text clustering, and most algorithm cannot have a favorable performance both in the complexity of algorithm and efficiency of clustering; on the other hand, most approaches do not adequately consider the semantic factors on the impact of text clustering or the semantic factors cannot be effectively integrated into clustering process. Consequently, the result of text clustering is not satisfactory.Refer to rationally balancing the complexity and clustering quality of clustering algorithm, this paper Introduces several representative clustering algorithm and analyzes their advantages and disadvantages in the field of text clustering, proposing a density-based clustering algorithm named DBCKNN according to the combined advantages between the Partition-based clustering algorithm and the density-based clustering algorithm. The concepts of k-nearest neighbor and outlier degree applied in this algorithm can find the center and radius of each cluster from a data set rapidly, and improve the efficiency of clustering on the basis of kind effectiveness.In terms of how can we effectively blend semantic Factors into the process of clustering, this paper proposes a text clustering method based on semantics which can transform VSM model into VSM'model. We twist each two dimensions of VSM model by their similarity and transform the orthogonal coordinate system into oblique coordinate system. Then the feature vectors of each document are projected into the VSM' model. A traditional text clustering method based on that VSM' model can relatively decrease semantic distance between the feature vectors which are semantically relevant. This method can increase recall rate and precision of text clustering, and make the clustering results more semantic. This paper also verifies the effectiveness and accuracy of the proposed algorithm and method by theoretical analysis and experiments. Finally, in the end of this paper, the rationality of the work will be evaluated and the prospect of the Developments in the field of text clustering will be presented.

Keywords/Search Tags:

text clustering, density-based, clustering algorithm, VSM model, semantic

PDF Full Text Request

Related items

1	Research On Text Clustering Algorithm Based On Word Frequency And Semantic
2	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering
3	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
4	Study On New Data And Text Clustering Methods Based On Representatives
5	Research On Text Clustering Algorithm Based On Semantic Similarity
6	Density Clustering Algorithm Based On Improved Support Vector Machine
7	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation
8	Research On Text Clustering Based On Semantic Similarity
9	Study Of Text Clustering Algorithm Based On Semantics
10	Manifold Density Peak Clustering Algorithm And Its Application Of Weibo Text Classification