Research On Text Clustering Problems Of Kernel Function And Self-definite Category Number

Posted on:2009-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Zhang

Full Text:PDF

GTID:2178360245986447

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, the high speed development of Internet brings much convenience to people's life. But the never ceasing dynamic changes make many new texts hard to be described by existing category system. If they are regrouped, the categorized training text groups must be established again. The expense of gaining large quantity grouped samples is very high. In this case, the researches of clustering gain more and more attention.At present, the classical clustering methods of K equal value and fuzzy k equal value can only work on some typically distributed samples. The efficiency of such clustering and fuzzy clustering greatly depends on distribution of samples. For example, if one sample has bigger distribution, while another smaller, the effect is not satisfactory. If sample distribution is in a mess, the result of clustering is awful.Kernel function not only can project non-line problems of low dimension to high dimension, but also the inner accumulation of high dimension space can be obtained the inputting vector to low dimension through kernel function, and the calculation have not rise too much with the increase of dimension. In this research, the text clustering arithmetic is projected on the basis of understanding basic clustering function theory, e.g. kernel clustering arithmetic. Through adopting Mercer Kernel, the input samples are projected to high dimension character space, which optimizes the sample characters and perform clustering in character space.At the same time, the present clustering arithmetic demands clustering category number in advance. In the circumstance of unaware of inner structure of original data, it is hard to gain suitable clustering category number. In this research, a graph kernel clustering method is projected on the basis of analyzing graph theory's connected graph, which can automatically determine clustering category number. Very data sample is looked as the top point V; thus all data samples form a non-directional adding graph G = ?V ,E? . In this paper connected modulus is defined with the angel of graph theory's connected graph, which can fully reflect the best clustering number. This modulus not only divides similar texts into same connected graph, but also has clear physical meaning.

Keywords/Search Tags:

text clustering, clustering analysis, kernel function, kernel clustering, graph kernel clustering

PDF Full Text Request

Related items

1	Researching The Kernel Clustering Algorithm And Its Application In Text Clustering
2	Research On Kernel--based Hierarchical Clustering Algorithm
3	Research On Weighted Kernel FCM Algorithm With Double Variables And Its Validity Evaluation
4	Researches In Kernel-based Fuzzy C-Means Clustering Algorithm Based On GA Optimization
5	Research On Optimization Methods For Kernel K-means
6	Improved Fuzzy Kernel Clustering With Outliers
7	Clustering Analysis Study Based On Kernel Function
8	The Hybrid Clustering Algorithm Based On Nuclear Research
9	Hrrp Recognition Method Based On Kernel Clustering
10	Research Of Kernel Methods For Support Vector Machine And Multiple Kernel Clustering Algorithm