Font Size: a A A

Research On Text Clustering Problems Of Kernel Function And Self-definite Category Number

Posted on:2009-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhangFull Text:PDF
GTID:2178360245986447Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the high speed development of Internet brings much convenience to people's life. But the never ceasing dynamic changes make many new texts hard to be described by existing category system. If they are regrouped, the categorized training text groups must be established again. The expense of gaining large quantity grouped samples is very high. In this case, the researches of clustering gain more and more attention.At present, the classical clustering methods of K equal value and fuzzy k equal value can only work on some typically distributed samples. The efficiency of such clustering and fuzzy clustering greatly depends on distribution of samples. For example, if one sample has bigger distribution, while another smaller, the effect is not satisfactory. If sample distribution is in a mess, the result of clustering is awful.Kernel function not only can project non-line problems of low dimension to high dimension, but also the inner accumulation of high dimension space can be obtained the inputting vector to low dimension through kernel function, and the calculation have not rise too much with the increase of dimension. In this research, the text clustering arithmetic is projected on the basis of understanding basic clustering function theory, e.g. kernel clustering arithmetic. Through adopting Mercer Kernel, the input samples are projected to high dimension character space, which optimizes the sample characters and perform clustering in character space.At the same time, the present clustering arithmetic demands clustering category number in advance. In the circumstance of unaware of inner structure of original data, it is hard to gain suitable clustering category number. In this research, a graph kernel clustering method is projected on the basis of analyzing graph theory's connected graph, which can automatically determine clustering category number. Very data sample is looked as the top point V; thus all data samples form a non-directional adding graph G = ?V ,E? . In this paper connected modulus is defined with the angel of graph theory's connected graph, which can fully reflect the best clustering number. This modulus not only divides similar texts into same connected graph, but also has clear physical meaning.
Keywords/Search Tags:text clustering, clustering analysis, kernel function, kernel clustering, graph kernel clustering
PDF Full Text Request
Related items