Font Size: a A A

Improvement Of KNN And Its Application To Text Classification

Posted on:2010-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:F J BoFull Text:PDF
GTID:2178360278975506Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,text information greatly increases,and how to get useful information is becoming more and more important, text mining can help to gain it. Text classification is the key technology of text mining, so research on text classification is extremely important.KNN is widely used in machine learning and data mining owing to its simpleness and robustness, and has been proved to be the best method in vector space models. However, KNN has a shortage: the classification efficiency will fall when the training sample sets and attributes increase.Aiming at the shortage of KNN, this article presents an improved KNN algorithm named PKNN, which bases on Projection Pursuit theory and iDistance index structure, the PKNN can picks up nearest training sample sets by searching single dimensional projection distance, then gets the nearest K neighbors by calculating the similarity between the test set and the selected training sets. Because the selected training sets are far less than the whole training sets, so the PKNN can improve efficiency of classification.This article firstly introduces general situation and research of text classification, then, analysis in very detail the preprocessing of text classification. After further research on KNN, an improved PKNN algorithm is presented, and a Chinese Text Classification System is designed bases on the PKNN theory, the system has training module,classification module and evaluation module, the functions of the system are as follows: Removing stop words from texts, features selection, computing weight of features, classification, and so on. The system has two feature selection methods and two classification methods to chose.Finally, some experiments designed to validate the efficiency and precision of PKNN.
Keywords/Search Tags:Text Classification, Feature Selection, K-Nearest Neighbors, Dimensionality Reduction, Projection Pursuit
PDF Full Text Request
Related items