Font Size: a A A

Adaptive Weighted KNN Text Classification

Posted on:2015-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:F L WuFull Text:PDF
GTID:2298330422489866Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important component in natural language processing, automatictext classification is used to organize and manage large amounts of text data,and widely applied in many areas, such as information retrieval, documentfiltering, word discrimination and so on. The main technology of textclassification involves feature selecting, feature weighting, dimensionreducing, text representing, and classification algorithms and so on. Due tothe high time complexity and space complexity when using text classifier toprocess the high dimensional and large data set, how to reduce the dimensionof text representation and improve the classifier design is a hot field of textclassification.K-Nearest Neighbor (KNN) algorithm has simple classification idea andgood effect of classification, and it is one of the most widely used textclassifiers in text classification field. However, classical KNN algorithm haslow efficiency when dealing with large-scale text classification task and it iseasily misleading the classification process as there is no distinction betweenthe key and normal features and no distinction between key and normalsamples. In this paper, the defects of KNN text classification algorithmdescribed above are analyzed and researched. And this paper is focusing onthe research work content as follows:1) Aimed at the defects of high dimensional feature space for classicalKNN, such as long classification time and low accuracy of classification, anadaptive feature weighted KNN text classification algorithm is put forward.The classification algorithm uses the overall accuracy as optimization objectfunction, and weighted normalized constraints are applied to the features.Secondly, uses the improved Particle Swarm Optimization (PSO) algorithm withnormalized constraint step attenuation to solve optimization of featureweights. Finally, computes the text similarity and reduces the feature dimensions based on the weights. The capacity of the classifier using10-fold cross-validation on Fudan university corpus, Chinese classificationcorpus in tourism field, Chinese classification corpus in sports field data setsshowed the improved algorithm enhanced classification accuracy and reducedthe classification processing time.2) Through the study of the process of KNN classification algorithmshows that the key to improving the efficiency of the algorithm is to reducethe amount of similarity calculation. An adaptive sample weighted KNN textclassification algorithm is proposed. The algorithm firstly uses the improvedPSO algorithm to adaptively solve the weights of samples. Then the numberof samples is cut down according to the weights of samples. Finally, theweights of samples are added into the discrimination function of KNN so thatthe problem which KNN is sensitive to storage of database can be solved. Thisalgorithm gets very good result on TanCorpMin corpus.3) According to the problems when KNN algorithm is used to highdimensional feature space and big dataset, the adaptive weighted KNN textclassification algorithm which combines the above two improved algorithm isput forward. Firstly, uses the algorithm to weight the features, reduces thedimension of feature space, updates feature thesaurus and re-vectorizessamples. Secondly, weights the samples and reduces them. Finally, applies theweighted discrimination function to text classifier. Experiments show that thecombined algorithm effectively reduces the time complexity and spacecomplexity of classification.
Keywords/Search Tags:KNN, Feature weighted, Sample Weighted, Normalized, Particle swarm optimization
PDF Full Text Request
Related items