Knn Text Classification Algorithm Based On The Semantics Of The Center

Posted on:2008-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:J Wei

Full Text:PDF

GTID:2208360215497963

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Researches on the algorithms of text categorization and text clustering are done in thispaper. We analyse some critical technologies and problems, and make some improvements.Firstly, Vector Space Model and methods of term weight computing are introduced, and wecompare several good methods of feature selection. Then, we selectively analyse twoclassification algorithms: SVM and KNN, whose performances are better than others. Ourexperiments on this two methods show that the stability of KNN is better than that of SVM,so we pick it into our real system.As KNN is a algorithm based on sample instances, the slow speed of classifying is abig problem. We propose an idea that document samples are replaced by less semanticcenters to overcome this problem. Text clustering is used to construct the semantic centers,and we expatiate the nearest neighbour clustering algorithm and its specific problems. Andsome means of tuning parameters dynamicly are used to optimize the clustering quality.For the problem of initial clustering centriods, we improve an existing algorithm andpresent details of the corresponding algorithm flow.Finally, our experiments evaluate the above algorithms on several different-sizedatasets and the results show that our KNN classification algorithm based on semanticcenters greatly improve the classifying speed with high precision.

Keywords/Search Tags:

Text Categorization, Text Clustering, Feature Selection, Clustering Centroids Initialization, K-nearest Neighbor, Semantic Center

PDF Full Text Request

Related items

1	The Research On K-nearest Neighbor Chinese Text Categorization Algorithm
2	Design And Implementation Of Kazak Text Categorization System
3	The Research Of Text Representation And Feature Selection In Text Categorization
4	Design And Realization Of Text Categorization System
5	A Study On Chinese Text Categorization
6	Multi-class Scientific Literature Automatic Categorization System
7	Research Of Text Clustering Based On Genetic Algorithm
8	A Study On Text Categorization Based On Machine Learning
9	The Text Categorization Algorithm Based On Nearest Subspace Search
10	Precise Clustering Algorithm For Chinese Text Based On K-means