Font Size: a A A

Knn Text Classification Algorithm Based On The Semantics Of The Center

Posted on:2008-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:J WeiFull Text:PDF
GTID:2208360215497963Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Researches on the algorithms of text categorization and text clustering are done in thispaper. We analyse some critical technologies and problems, and make some improvements.Firstly, Vector Space Model and methods of term weight computing are introduced, and wecompare several good methods of feature selection. Then, we selectively analyse twoclassification algorithms: SVM and KNN, whose performances are better than others. Ourexperiments on this two methods show that the stability of KNN is better than that of SVM,so we pick it into our real system.As KNN is a algorithm based on sample instances, the slow speed of classifying is abig problem. We propose an idea that document samples are replaced by less semanticcenters to overcome this problem. Text clustering is used to construct the semantic centers,and we expatiate the nearest neighbour clustering algorithm and its specific problems. Andsome means of tuning parameters dynamicly are used to optimize the clustering quality.For the problem of initial clustering centriods, we improve an existing algorithm andpresent details of the corresponding algorithm flow.Finally, our experiments evaluate the above algorithms on several different-sizedatasets and the results show that our KNN classification algorithm based on semanticcenters greatly improve the classifying speed with high precision.
Keywords/Search Tags:Text Categorization, Text Clustering, Feature Selection, Clustering Centroids Initialization, K-nearest Neighbor, Semantic Center
PDF Full Text Request
Related items