Font Size: a A A

Research Of Patent Text Classification Algorithm Based On KNN

Posted on:2013-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:D W YuanFull Text:PDF
GTID:2268330392461690Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and information technology, lots ofsemi-structured and unstructured text information greatly increases. However, how to obtain theuseful information quickly and accurately, it has becomes an urgent problem. At present,patent technology becomes the core competitiveness of countries or regions, there is atrend to classify the patents through text classification techniques face of vast quantitiesof patent information.Firstly, this paper introduces text classification research present situation and thepatent classification background. Secondly, it systematically introduces the keytechnologies of text classification and various classification algorithms, and variousclassification algorithms in different fields of application. At present, KNN classifierswith respect to the other classifiers classify better in many classifiers, but it still hassome shortcomings, such as the classification speed slow, the classification accuracylow. For overcoming these shortcomings of the KNN classification algorithm, wepropose an idea that an optimized KNN algorithm classifier, the classification modularby training, classification and evaluation of three parts. Optimized KNN algorithm isbased on the cluster of the original space model the training set for processing, and thetraining set is similar to the text forms a cluster, each cluster as a common text,calculated for each cluster center vector, and then set a threshold, higher than thethreshold of cluster management, and reformation of the training set. The classificationalgorithms to save the original text information based on the sparse characteristic,according to the text. This paper uses the compressed representation model, and thendoes the calculation of distance and the final will be the test texts which belong to thecategory of. This algorithm not only reduces the amount of computation, but alsoimproves the KNN classification speed and accuracy.Finally, the optimized KNN algorithm in computing speed, accuracy, error rate andrecall as evaluation through patent classification experiments, confirm the optimizedKNN algorithm compared with the original. KNN algorithm in the classification resultscan be improved to some extent.
Keywords/Search Tags:Text classification, KNN arithmetic, Cluster, Patent text classification
PDF Full Text Request
Related items