Font Size: a A A

Application Of Natural Neighbor In Text Classification

Posted on:2018-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2348330536968746Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the amount of information on the Internet is growing explosively.It is necessary to analyze and excavate the value of the information in the mass data,which is based on the internet.Text classification is one of the most common methods in data mining.This paper makes a detailed study of text classification technology.In the field of text classification,the K nearest neighbor text classification algorithm is proposed based on the nearest neighbor technology.However,the traditional K nearest neighbor text classification algorithm has two disadvantages.First,the determination of K value is always faced with difficulties,if the settings are unreasonable,it will have a great impact on the classification results and reduce the accuracy of the classification algorithm.Moreover,for the different text data sets,the determination of K value has no experience to follow,which brings great trouble to the researchers.Second,the classification results are greatly affected by the distribution of the text data set,when the distribution of training text data set is serious,the classification effect is not ideal.In this paper,we propose the idea of natural neighbor,and apply it to text classification,which overcomes the shortcoming of K nearest neighbor text classification.The main work of this paper is as follows:Firstly,this paper studies and analyzes the background and significance of text classification,and summarizes the research status of the key technologies of text classification.Secondly,this paper introduces the detailed process and steps of text classification,and makes a detailed summary of the classical techniques in each step.Moreover,this paper mainly discusses several common algorithms of text classification,and analyzes their advantages and disadvantages.Thirdly,this paper introduces the concept of nearest neighbor technology and its application in text classification.The disadvantages of nearest neighbor technique are discussed in detail.Because of the uncertainty of the parameters in the nearest neighbor and the sensitivity to the distribution of the data set,this paper puts forward the idea of natural neighbor.The original natural stable state of natural neighbor algorithm is improved.The natural neighbor algorithm can adaptively obtain the natural neighbor of the data set without any parameters.Finally,the characteristics of natural neighbor are analyzed and summarized,and the feasibility of natural neighbor algorithm for high dimensional data is verified.Fourthly,a text classification algorithm based on natural neighbor is proposed(TCbNa N).The superiority of the algorithm is verified by the comparison with KNN and NaN algorithm.Firstly,an algorithm based on natural neighbor is proposed.The weights of each text vector in the training set are obtained by reassigning the weights of the training set.Then,a natural neighbor text classification algorithm based on the weight assignment information is proposed.Finally,by comparing with KNN and NaN algorithm,the superiority of TCbNaN algorithm is verified.
Keywords/Search Tags:text classification, natural nearest neighbor, k-nearest neighbor, weight assignment
PDF Full Text Request
Related items