Font Size: a A A

Research On The Similarity Of K-nearest Neighbor Algorithm

Posted on:2015-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:D PanFull Text:PDF
GTID:2298330431481913Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Nowadays, information is wealth and power. But obviously, it is impossible to com-mand over all the information. Therefore, it becomes pretty important to extract theuseful information from a variety of data source. The traditional information technologycould not meet our needs in text mananagement today. With the increase of demand,text mining technology emerged. Text classifcation is the most basic work in text miningtechnology, which can help us to retrieve, query, and flter text messages, thereby it canincrease the availability of the useful information. The objective of text classifcation is toclassify original text into several pre-defned categories based on the information amount.Also, such problems lie in many areas, including web searching, ofce automation, newswebsite searching and subject index classifcation. It is one of the key technologies ofdigital library in information fltering, search engines.KNN algorithm, a non-parametric classifcation method, is widely used in machinelearning because of its simple and efective features.Our paper studys the infuence ofsimilarity for KNN algorithm on the text classifcation. At the same time we comparethe efect of diferent distance functions on the KNN classifcation algorithm. In thetraditional KNN algorithm, we do not consider each test sample points from each classof the training set center distance in the similarity formula; on the other hand we do nottake into account the characteristics on the diferent categories of classifcation. Basedon the two drawbacks, in this paper, we put forward a new calculation method. Andthe experiment results show that, the new similarity calculation formula do improve thecorrect rate of text categorization.
Keywords/Search Tags:KNN Algorithm, Similarity Formula, Characteristics
PDF Full Text Request
Related items