Chinese Text Classification And Python Implementation Based On Naive Bayes

Posted on:2019-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhang

Full Text:PDF

GTID:2438330548455967

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

At present,with the continuous popularization of computers and the rapid development of the Internet,there is a constant emergence of new scientific and technological knowledge in this era of information explosion with an unprecedented amount of information.The sources of information are extremely broad,and the speed of dissemination is extremely fast.It has become an urgent need for people to obtain valuable information from vast information in a short period of time.In order to meet people's needs,a Chinese text classification method in text data mining has been developed.It is a combination of statistical methods and machine learning methods applied to text classification.Chinese text classification is based on attribute features such as topic words of text content,and the corresponding text is divided into categories defined by the user according to requirements.Generally,the text classification category of the output result is obtained by inputting the feature vector of the text.This paper first introduces the research background of text classification,research status at home and abroad,and the practical application value of this method.Then it introduces the theoretical analysis process of Chinese text classification and the theoretical thinking of naive Bayes classifier and logistic regression classifier.In the experiment stage,the news data of the five categories under the �Sogou Corpus� was selected,and then the corpus was programmed according to the theoretical process of Chinese text classification using Python's integrated environment anaconda.Firstly,word segmentation and deletion word processing were performed on the data set.Then TF-IDF was combined with N-Gram to perform dimensionality reduction processing.The Na?ve Bayes classifier and logistic regression classifier were constructed to classify Chinese texts.In order to make the precision and recall rate of the classifier performance indicators more accurate,a cross-validation method was used.Finally,the classifier's optimal parameters were searched.After comparison,it was found that the naive Bayes classifier has better classification effect.

Keywords/Search Tags:

Chinese text classification, TF-IDF, Naive Bayesian classfier, Python

PDF Full Text Request

Related items

1	Research On Chinese Text Sentiment Polarity Classification Based On Naive Bayesian
2	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
3	Research On Chinese Text Sentiment Polarity Classification
4	The Application Of A Text Classification Method In Fraud SMS Identification
5	Research Of Junk Filtering Algorithm Based On Naive Bayesian Of Accumulating Knowledge
6	Research On Chinese Information Classification Based On Improved Bayesian Algorithms
7	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
8	Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian
9	Design And Implementation Of Short Message Classification System Based On Naive Bayesian
10	Chinese Web Pages Based On Naive Bayesian Classification Technology Research And Application