Font Size: a A A

K-NN, K-means And The Application In Text Mining

Posted on:2004-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhanFull Text:PDF
GTID:2168360122461132Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The performance of K-means clustering algorithm depends on the selection of distance metrics. The Euclidean distance is usually chosen as the similarity measure in the conventional K-means clustering algorithm, which usually relates to all attributes. When feature weight parameters are introduced to the distance formula, the performance will depend on the weight values and accordingly can be improved by adjust weight values. Since K-means algorithm is iterative, it is difficult to optimize clustering results by giving weight values directly. An indirect learning feature weight algorithm is introduced to improve the clustering result. Mathematically it corresponds to a linear transformation for a set of points in the Euclidean space.For K value learning, this paper made a better selection using Genetic Algorithm primarily. At the same time, this paper puts forward a validity function for judging clustering in order to lead us to use it in K-nearest neighbor classification; then introduces "Generalization Capability of a case" to K-nearest neighbour. According to the proposed approach, the cases with better Generalization Capability are maintained as the representative cases while those redundant cases found in their coverage are removed. We can find a new less but almost complete training data set, consequently reduce complexity of seeking near neighbour.Based on the research idea upwards, this paper puts forward the application of near neighbour algorithm in text classification, achieves word segmentation and classification in Reuters-21578, introduces learning feature weight value in it synchronously, and do some basal experiments and analysis on Chinese word segmentation.
Keywords/Search Tags:K-nearest neighbor, K-means clustering, Feature weight, Generalization Capability, Text mining
PDF Full Text Request
Related items