Researching The Kernel Clustering Algorithm And Its Application In Text Clustering

Posted on:2015-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Xu

Full Text:PDF

GTID:2298330422488481

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the popularity of the Internet and continuous improvement of network technology,Internet has become the world’s largest and richest information repository. However, whenusers query information, they are often drowned and lost in the sea of information, whichgreatly reducing the retrieval efficiency.Text clustering technique is an effective way to solve the visualization and manage thevast amounts of textual information. Since text clustering is not required categories ofinformation and can automatically complete text grouping, so it is widely used ininformation retrieval in recent years. There are many classic clustering methods such asC-means clustering and fuzzy C-means clustering, which can only work for some typicaldistribution of the samples. They directly use the characteristics of the sample to clusteringwithout optimizing for the characteristics of the samples. The effectiveness of clusteringmethods depends largely on the effectiveness of the distribution of the samples. However, incertain larger scatter of samples and in certain smaller scatter of samples, the effectivenessof these methods is relatively poor. Due to feature vector inner product in high-dimensionalspace can be calculated directly by the kernel function with a low-dimensional space of theinput vectorThe main idea of kernel clustering method is through a non-linear mapping. Thepurpose is to map the input space data points to a high dimensional feature space and selectthe appropriate Mercer kernel function instead of the product of the nonlinear mapping,which can clustering in the feature space, so the computation does not increase with thenumber of dimensions.In this paper, understanding the basic theory of kernel method and combining with theentropy theory, we propose the subspace samples selection based on kernel FCM andmaximum entropy fuzzy C-means clustering based on sample weighting and initial clustercenters (WKMEFCM). Finally this paper applies them to text clustering. Experimentsconfirmed that, since the introduction of Mercer kernel function, the originals, which do notshow the characteristics, can stand out, so the clustering results are better for the distributionof confusion and difficult to draw highly relevant text data.Finally, based on open source Carrot2, this paper builds a Chinese text clustering Websearch system and implements clustering for search results. For Chinese characteristics, calculated on the weight of features, this paper not only considers the traditional termsfrequency and documents frequency, but also combines the parts of speech and wordsposition in the text, so that the weight of credibility is increased. The proposed WKMEFCMalgorithm is applied to the system, the assessment shows that the system is further improvedthe efficiency of information retrieval.

Keywords/Search Tags:

Text Clustering, Kernel Function, Subspace Samples Selection, MaximumEntropy Clustering, Feature Weighted

PDF Full Text Request

Related items

1	Research On Text Clustering Problems Of Kernel Function And Self-definite Category Number
2	Applications And Research On Possibilistic Fuzzy Kernel Clustering Algorithm Based On Sample-feature Weighted
3	Text Correlation Research Based On Subspace Clustering
4	Using A Weighted Network Graph Clustering And Subspace Ensemble Approach For High-dimension Data Classification
5	A Study Of Sparse Subspace Clustering For Image Sequence And Its Applications
6	Subspace Clustering Algorithm Based On Feature Selection And Sparse Representation
7	Research On Weighted Kernel FCM Algorithm With Double Variables And Its Validity Evaluation
8	Precise Clustering Algorithm For Chinese Text Based On K-means
9	The Research On Web Text Clustering Based On DBSCAN Optimized Algorithm
10	Research On Multi-view Clustering Algorithms