Research Of Text Classification Algorithm Based On Comparative Feature

Posted on:2009-07-06

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhao

Full Text:PDF

GTID:2178360245453580

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer, communication and network as well as the popularity of the Internet, the number of electronic documents is on the increase. In order to utilize the non-structural data resource efficiently, there is a great demand for high effcient content-based text searching, consulting, and filtering systems. Text mining is a key factor to the construction of these systems.Text classification as an important part of text mining has been a study subject for con-cern. Now many methods have been applied to this field, such as Naive Bayes, SVM, KNN, Neural Network, etc. Among these methods, Naive Bayes, with the prior information, could provide a pattern and a handling method in the process of uncertain logic. Naive Bayes is high efficient and easy to be operated, so it has been widely used. Neural Network is a very popu-lar method nowadays with the ability of learning and the character of fault-tolerance, and it need not make assumptions on the probability model. However, in the application of text classification, Naive Bayes can not reflect semantic relations and the accuracy of general Neural Network is not high. Directed to these problemes, the main contributions of the thesis are summarized as follow:(1) After the introduction of several common algorithms in text classification, the thesis is focused on two classical algorithms-Naive Bayes and Self-Organizing Feature Map. By consulting documents and doing experiments, the two algorithms are analyzed and compared in detail.(2) The thesis puts forward the concepts of comparative feature and comparative thresh-old by combining the two algorithms, based on the idea of "divide and rule", then a new text classification algorithm based on comparative feature is proposed. The analysis, design and operating of this algorithm are introduced in detail.(3) The thesis analyzes and compares the respective characteristic of Chinese corpus and English corpus as well as the problems of pretreatment, and shows the methods and results of the pretreatment of English corpus. At the same time, the analysis and comparison on differ-ent results from the three algorithms working on the two corpuses are presented.(4) The thesis analyzes the three algorithms on Chinese corpus and English corpus, compares the newly proposed text classification algorithm based on comparative feature with the tradional Naive Bayes algorithm and Self-Organizing Feature Map effectively.Experimental results show that text classification algorithm based on comparative feature has gained a satisfactory effect. It is a highly efficient algorithm for text classification.

Keywords/Search Tags:

Text Classification, Naive Bayes, Self-Organizing Feature Map, Comparative Feature, Comparative Threshold

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Based On Naive Bayes Method
2	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
3	Text Classification Algorithm Research Based On Naive Bayes
4	Research On Comparative Identification And Comparative Opinion Element Extraction
5	Research On Spam Text Classification Based On Improved Naive Bayes Algorithm
6	Classification Research On News Text Classification Based On Feature Selection Method
7	Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian
8	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
9	Research On Text Classification Algorithms Based On Machine Learning
10	Text Categorization Based On Naive Bayes Method