The Research On K-nearest Neighbor Chinese Text Categorization Algorithm

Posted on:2011-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:T Lu

Full Text:PDF

GTID:2178360308973011

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In order to search or extract information in a special category from large data source , so text automatic categorization has become a hot subject of research. KNN is an important method of text automatic Classification, and it can deal with large data sets with more stability. Based on the comprehensive overview of Chinese Text Categorization, this thesis focuses on the research of KNN algorithm. The main contents of this thesis are as follows:(1) The thesis makes a summary of the research background and development status about text Classification algorithm. It introduces the general process of Chinese text categorization, including the key technologies and the methods of its quality assessment.(2) KNN text categorization for large scale data processing, there is a problem of slow classification speed. Aiming at this question, a kernel-KNN algorithm based on KNN categorization is proposed, it introduces the semantic relation of feature items, and clusters to build center documents. This method reduces the number of documents which KNN should search, and increases the speed of categorization. Simulation results show that the proposed algorithm improves the classification speed.(3) A category method is proposed to lower the effects of uneven distribution of different resources in a training set on text categorization. Based on k-KNN, it uses little Ks for testing the documents in the training set which between the edge of classes, and categorizes it into the right class. This method decreases the wrong classification between the edge of classes. The experiment shows that it has good performance.

Keywords/Search Tags:

KNN Text Categorization, Clustering, Semantic Similarity, Training Set

PDF Full Text Request

Related items

1	Research On The Method Of Text Categorization Based On Semantic Similarity
2	Research On Semantic Similarity Based On Text Categorization
3	Research On Ontology-Based Semantic Text Categorization
4	Research On Text Clustering Based On Semantic Similarity
5	Study On How Net Ontology Based Text Categorization Algorithm And It's Application
6	Research And Implementation Of Chinese Text Categorization System Based On Semantic Similarity
7	Research On Thesis Text Clustering Based On Semantic Similarity
8	Search Of Group Intelligent Text Clustering Methods Based On Semantic Similarity
9	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
10	Research On Document Clustering Based On Semantic Similarity Of Hownet