Font Size: a A A

Analysis And Research Of Common Text Classification Algorithms

Posted on:2018-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2348330536969312Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The arrival of the era of big data cause the explosion growth of data in various fields,the value of more and more data is waiting us to detect,which makes the data mining technology become popular in recent years.Because the traditional data analysis methods can not handle text data,the value of unstructured data has not been fully excavated.Compared to traditional data mining,text mining seems to be more in demand and accord with the background.Text classification has been one of the hot topics in the field of text mining,which is widely used in various fields.In the part of theory,this paper begins with the concept of text mining,and introduces the relevant knowledge of text mining,including text preprocessing,weight calculation,feature selection and text representation.In addition,this paper mainly presents the common statistical models and methods,such as the K-Nearest Neighbor(KNN),Naive Bayes(NB),Decision Tree and so on.In the aspect of Ensemble Learning,this paper gives an overview of the two methods of Bagging and Boosting,compares the differences between the two methods and expounds Random Forest algorithm.In the part of practice,firstly we selected three kinds of single classification model,compares their results in text classification.From the classification accuracy,due to the stability and flexibility the KNN algorithm performs better than NB algorithm and Decision Tree method.In addition,we establish a Random Forest model on the same data and compare the classification results with KNN algorithm,which proves that Ensemble Learning is better than single model in classification accuracy.Finally,considering the fact that the text data is usually larger in the practical application,a text classification model based on single machine is tried.
Keywords/Search Tags:Text Classification, Vector Space Model, TF-IDF, Ensemble Learning
PDF Full Text Request
Related items