Font Size: a A A

Research On KNN Text Classification Algorithm

Posted on:2017-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:L TianFull Text:PDF
GTID:2348330536476741Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of network technology,people have entered the network era,and the emergence of the "information explosion" of the situation.However,facing the rich information resources also faced with information about the disaster,the vast amount of information has clutter and redundant,it is difficult to efficient get effective information required.Text classification technology as a key technology processing large amounts of text information,in recent years obtained the rapid development and text categorization technology in information organization,information retrieval,semantic analysis,topic tracking and digital library have been widely used.There are many text classification algorithms,such as KNN algorithm.support vector machine,Bayesian classification technology,and so on,and in the practical applications of each algorithm have their respective advantages.KNN algorithm is a classical statistical pattern recognition method,and it is also one of the best methods of text classification.The idea of KNN algorithm is proposed by cover and Hart in 1968,after the scholars aiming at the shortcomings of the KNN algorithm make same improvements,for exa,mple,cut out the training sample set?make same improvment for the similarity calculation formula,and so on.However,these methods still have some problems,for example,the algorithm of the class center method to cut out the training samples is ignore the uneven distribution of samples,the improved computational dimension of the formula for calculating the similarity is still relatively high.This paper mainly for KNN Algorithm in similarity calculation formula and the training sample set cutting method to improve,and we propose an improved algorithm based on M operator method and an improved ICNN algorithm based on logo sample generation strategy.M operator method to enhance the weight of the most relevant feature items in low dimensional text classification,and improves the classification accuracy and classification speed.Logo sample generation algorithm takes into account the uniform distribution of samples,through the training set of samples to effectively cut,reduce the amount of computation and the classification time,and ensure the accuracy of the classification.
Keywords/Search Tags:Text classification technology, KNN Algorithm, Accuracy, Dimension
PDF Full Text Request
Related items