Font Size: a A A

Research On Text Categorization Based On Support Vector Machines

Posted on:2009-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:J DiFull Text:PDF
GTID:2178360242989706Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
We live in an era of information explosion, we need to classify information for find the information needed quickly from the mass of information.Technology came into being on the text classification.Text auto-classification is the core technology of information auto-classification. There are many features about text classification: wide spare of text vector, high dimension, comparatively relation among features. So SVM is very suitable and potential for resolving text classification.This paper speeds up the classification process by declining the number of text vectors. This thesis chooses the decisive samples from the original set by using density clustering algorithm.It's not a good way for making use of the common density clustering algorithm directly, because their time complexity is very high, this will cause the total classifying progress efficiency very low. So this article uses an improved density clustering algorithm, this algorithm mixes features of hierachical clustering algorithm CURE. It not only retains the feature that is sensitive to edge point, but also declines the time complexity of density clustering progress.The main work is as follows in the paper:Firstly, this paper introduces the classification and contrasts the classification.And adopt a document type and frequency of the two methods to estimate the probability of comparative experiment.SVM is a relatively better way.Secondly, pose a method based on the proposed category of the frequency characteristics of options.By contrast shows that the experimental method is a good feature selection methods.Thirdly, pose a method by use of the density of clustering algorithm from the boundary and by made in high-dimensional data environment dynamic parameters of the way.Fourthly, a concrete realization of the density of cluster points from the boundary of the classification.
Keywords/Search Tags:SVM, Text Categorization, Density Clustering Algorithm
PDF Full Text Request
Related items