Application Of Natural Neighbor In Text Classification

Posted on:2018-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2348330536968746

Subject:Engineering

Abstract/Summary:

With the rapid development of information technology,the amount of information on the Internet is growing explosively.It is necessary to analyze and excavate the value of the information in the mass data,which is based on the internet.Text classification is one of the most common methods in data mining.This paper makes a detailed study of text classification technology.In the field of text classification,the K nearest neighbor text classification algorithm is proposed based on the nearest neighbor technology.However,the traditional K nearest neighbor text classification algorithm has two disadvantages.First,the determination of K value is always faced with difficulties,if the settings are unreasonable,it will have a great impact on the classification results and reduce the accuracy of the classification algorithm.Moreover,for the different text data sets,the determination of K value has no experience to follow,which brings great trouble to the researchers.Second,the classification results are greatly affected by the distribution of the text data set,when the distribution of training text data set is serious,the classification effect is not ideal.In this paper,we propose the idea of natural neighbor,and apply it to text classification,which overcomes the shortcoming of K nearest neighbor text classification.The main work of this paper is as follows:Firstly,this paper studies and analyzes the background and significance of text classification,and summarizes the research status of the key technologies of text classification.Secondly,this paper introduces the detailed process and steps of text classification,and makes a detailed summary of the classical techniques in each step.Moreover,this paper mainly discusses several common algorithms of text classification,and analyzes their advantages and disadvantages.Thirdly,this paper introduces the concept of nearest neighbor technology and its application in text classification.The disadvantages of nearest neighbor technique are discussed in detail.Because of the uncertainty of the parameters in the nearest neighbor and the sensitivity to the distribution of the data set,this paper puts forward the idea of natural neighbor.The original natural stable state of natural neighbor algorithm is improved.The natural neighbor algorithm can adaptively obtain the natural neighbor of the data set without any parameters.Finally,the characteristics of natural neighbor are analyzed and summarized,and the feasibility of natural neighbor algorithm for high dimensional data is verified.Fourthly,a text classification algorithm based on natural neighbor is proposed(TCbNa N).The superiority of the algorithm is verified by the comparison with KNN and NaN algorithm.Firstly,an algorithm based on natural neighbor is proposed.The weights of each text vector in the training set are obtained by reassigning the weights of the training set.Then,a natural neighbor text classification algorithm based on the weight assignment information is proposed.Finally,by comparing with KNN and NaN algorithm,the superiority of TCbNaN algorithm is verified.

Keywords/Search Tags:

text classification, natural nearest neighbor, k-nearest neighbor, weight assignment

Related items

1	Application Of Natural Neighbor In Text Classification
2	Study On Generalized Nearest Neighbor Pattern Classification
3	Research On Several Pattern Classification Methods Based On K-nearest Neighbor Criterion
4	Nearest Neighbor Classification Improved Algorithm
5	Optimization Research Of Density Peaks Clustering Algorithm Based On Neighbor Searching
6	Research Of Nearest Neighbor Classification Algorithm Based On Sample Selection
7	An Outlier Detection Algorithm Based On Natural Nearest Neighbor
8	Research On The Visual Group K-Nearest Neighbor And Group Inverse K-Nearest Neighbor Query Of Multi-Source Objects In Three-Dimensional Space
9	Research On Continuous Nearest Neighbor Query
10	Research On Personalized Recommendation Algorithm Combining User Attributes And User-centric Natural Nearest Neighbor