Font Size: a A A

Researches On Semantic-Based Search In Text Clustering

Posted on:2012-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y SuFull Text:PDF
GTID:2218330338470391Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text clustering is an important branch of Data Mining. Today, with the increasing information, text clustering plays a significant role in our daily work and life. In terms of the text clustering, a lot of researches have been done accompanied by some initial achievements. But there are still many improvements in the current researches. Based on the condition of existing researches and comparative analysis, two key points summarized here to improve the text clustering. On the one hand, the clustering algorithm, as the core of text clustering, can directly determine the effectiveness and efficiency of clustering. However, currently, there is no clustering algorithm specifically applying in the text clustering, and most algorithm cannot have a favorable performance both in the complexity of algorithm and efficiency of clustering; on the other hand, most approaches do not adequately consider the semantic factors on the impact of text clustering or the semantic factors cannot be effectively integrated into clustering process. Consequently, the result of text clustering is not satisfactory.Refer to rationally balancing the complexity and clustering quality of clustering algorithm, this paper Introduces several representative clustering algorithm and analyzes their advantages and disadvantages in the field of text clustering, proposing a density-based clustering algorithm named DBCKNN according to the combined advantages between the Partition-based clustering algorithm and the density-based clustering algorithm. The concepts of k-nearest neighbor and outlier degree applied in this algorithm can find the center and radius of each cluster from a data set rapidly, and improve the efficiency of clustering on the basis of kind effectiveness.In terms of how can we effectively blend semantic Factors into the process of clustering, this paper proposes a text clustering method based on semantics which can transform VSM model into VSM'model. We twist each two dimensions of VSM model by their similarity and transform the orthogonal coordinate system into oblique coordinate system. Then the feature vectors of each document are projected into the VSM' model. A traditional text clustering method based on that VSM' model can relatively decrease semantic distance between the feature vectors which are semantically relevant. This method can increase recall rate and precision of text clustering, and make the clustering results more semantic. This paper also verifies the effectiveness and accuracy of the proposed algorithm and method by theoretical analysis and experiments. Finally, in the end of this paper, the rationality of the work will be evaluated and the prospect of the Developments in the field of text clustering will be presented.
Keywords/Search Tags:text clustering, density-based, clustering algorithm, VSM model, semantic
PDF Full Text Request
Related items