Research On Text Classification Algorithms Based On Machine Learning

Posted on:2008-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Yang

Full Text:PDF

GTID:2178360215471053

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the techniques of computer,database and networks as well as the popularity of the Internet,in the real world life, there are more and more data andinformation generated in every domain, especially a great dealof text data. How to auto catalog and pick up these text data,get useful information to help people, becomes a more and moreimportant problem. Thus text data mining, as a new subject, hasgradually become a remarkable and fast developed area.Text classification is one of the base techniques of text datamining, whose function is to assign the document to thepreassigned class in terms of its features. Text classificationis widely used in natural language process and analyzes,information organization and management and content filer area.The method early text classification used was based on knowledgeengineering and export system, which was very complex and lackof agility. With the arisen and developing of machine learning,lot of classifier models have been introduced into the textclassification domain, which have effect in different aspect. Recently, different text classification algorithms are justused in different applications with a good performance. So itis an important problem how to select a currently best properalgorithm to apply for some application. Accuracy is one of thewidely used main measures to evaluate classifier' s performance.But when processing some instances whose class distribution isunbalanceor the error cost is difference, we couldn't get aaccuracy result by using accuracy. In this situation, AUC isproposed to be a new evaluation measure for the textclassification performance. Some researches have showed thatAUC is more robust than accuracy, and AUC could give anevaluation for ranking. Thus, It is a remarkable problem whetherthe current "well" text classification algorithms are stilleffective for the new measure.Although the new measure has been proposed, there isn't awhole evaluation for the classic text classification algorithms.This paper will report a controlled study with uniform datasets,comparing the performances of the SVM, decision tree, nearestneighbor, naive bayes and Multinomial event model naive bayes.The main works are below:Firstly, Introduce and Analyze several popular textclassification algorithms and their basic principle; Secondly, Introduce a new evaluation measure for text classifiers, andanalyze its evaluate principle, finally make a comparison withthe old measure. Thirdly, Design a particular experiment toevaluate the performance of several popular text classificationalgorithms, point out their scarcity in the new measure andindicate the direction how to improve.

Keywords/Search Tags:

text classification, naive bayes, SVM, decision tree, nearest neighbor, AUC

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Based On Naive Bayes Method
2	Research On Hybrid Classification Based On Navie Bayes And Decision Tree
3	Research On Bayesian Networks-Based Text Classification Algorithms
4	Based Segmentation Of Chinese Text Automatic Classification And Implementation
5	Multi-view Adaptive K-nearest Neighbor Classification Based On Decision Tree
6	Text Categorization Based On Naive Bayes Method
7	Efficient Tumor Traceability Prediction Based On Hybrid Machine Learning
8	A Text Classifier About High Blood Pressure Based On Naive Bayes
9	Study On Generalized Nearest Neighbor Pattern Classification
10	Application Of Various Classification Methods In Spam Message Recognition