Font Size: a A A

Research On Feature Selection And Classification Algorithm Of Text Classification

Posted on:2015-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:R Q GongFull Text:PDF
GTID:2268330428963350Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of information technology and popularity of Internet, the amounts of electronic documents increase largely. There will be more and more information. As the information’s clutter and disorder, it is difficult to find what people really want in the mass data. Face such a situation, which people usually said "much data and poor information", how to organize and manage the mass data efficiently has become an important problem. Text classification technique is an effective way to solve this problem. It can effectively manage and organize text data through text classification techniques, help people efficiently and accurately positioning the text and provide strong support for user to obtain necessary information.This paper introduces several aspects of the text classification in detail, including text representati-on, text feature extraction and text classification algorithm, and mainly focuses on the two key technologies, including feature extraction and text classification algorithm. The main contents are as follows:(1) Research on text extraction methodTo analysis the feature extraction of the traditional TF-IDF algorithm, puts forward a new improvement strategies, in order to improve the recall and precision of feature extraction(2) Research on classification algorithmAnalyzing the advantages and disadvantages of the decision tree and logistic classification algorithm, the classification accuracy of decision tree is higher but the face of a large amount of data requires a large amount of computation in the process.Addressing these issues on the decision tree, an improved method of decision tree based on the logistic regression classification algorithm is proposed, using logistic regression algorithm to reduce data sets, leaving some of the biggest influence on classification data, improve the speed of constructing trees.(3) Experimental analysisAfter in-depth technical study and research of the related technology, by the experiments to analyze and verify the improved text proposed feature selection and classification algorithm.
Keywords/Search Tags:Text Categorization, Feature Selection, Classification Algorithm
PDF Full Text Request
Related items