Font Size: a A A

Information Retrieval Oriented Text Classification Technology Research

Posted on:2014-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:L W BaoFull Text:PDF
GTID:2248330395478021Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology brought us a lot of convenience in our daily lives, but related data files were increasing in various fields. In the case of more and more data accumulation, classifying text information must be done in order to be able to obtain the information we want faster and more accurate, Text classification can make data query more easily and it was good for data management and classification, This will not only facilitate the data storage but also convenient to data query.This paper is to study and research the automatic classification of the document based on the full-text search module in a project, and put forward an optimized documents classi-fication method. Automatic document classification is an extension of the full-text retrieval technology, duing to the increase of data, the query will bring a certain level of surplus, many categories of documents may contain the same keywords, In this paper, based on the full-text search, some mainstream machine learning classification algorithm was learned such as KNN, SVM, Naive Bayes and decision tree, etc. Then according to the time effi-ciency of KNN classification algorithm, presented a KNN classification algorithm based on the decision tree, when the classification threshold of decision tree classification reaches a certain range system will use KNN classification algorithm to process unclassified node, so,under the premise of not reducing classification accuracy significantly,the text classifi-cation efficiency is improved.
Keywords/Search Tags:Query, full-text search, text classification, classification efficiency
PDF Full Text Request
Related items