| With the rapid development of the techniques of computer,database and networks as well as the popularity of the Internet,in the real world life, there are more and more data andinformation generated in every domain, especially a great dealof text data. How to auto catalog and pick up these text data,get useful information to help people, becomes a more and moreimportant problem. Thus text data mining, as a new subject, hasgradually become a remarkable and fast developed area.Text classification is one of the base techniques of text datamining, whose function is to assign the document to thepreassigned class in terms of its features. Text classificationis widely used in natural language process and analyzes,information organization and management and content filer area.The method early text classification used was based on knowledgeengineering and export system, which was very complex and lackof agility. With the arisen and developing of machine learning,lot of classifier models have been introduced into the textclassification domain, which have effect in different aspect. Recently, different text classification algorithms are justused in different applications with a good performance. So itis an important problem how to select a currently best properalgorithm to apply for some application. Accuracy is one of thewidely used main measures to evaluate classifier' s performance.But when processing some instances whose class distribution isunbalanceor the error cost is difference, we couldn't get aaccuracy result by using accuracy. In this situation, AUC isproposed to be a new evaluation measure for the textclassification performance. Some researches have showed thatAUC is more robust than accuracy, and AUC could give anevaluation for ranking. Thus, It is a remarkable problem whetherthe current "well" text classification algorithms are stilleffective for the new measure.Although the new measure has been proposed, there isn't awhole evaluation for the classic text classification algorithms.This paper will report a controlled study with uniform datasets,comparing the performances of the SVM, decision tree, nearestneighbor, naive bayes and Multinomial event model naive bayes.The main works are below:Firstly, Introduce and Analyze several popular textclassification algorithms and their basic principle; Secondly, Introduce a new evaluation measure for text classifiers, andanalyze its evaluate principle, finally make a comparison withthe old measure. Thirdly, Design a particular experiment toevaluate the performance of several popular text classificationalgorithms, point out their scarcity in the new measure andindicate the direction how to improve. |