Research On Chinese Information Classification Based On Improved Bayesian Algorithms

Posted on:2020-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:X M Song

Full Text:PDF

GTID:2428330572472202

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,thousands of new texts appear on the Internet.Most of the data is stored in text,and it grows exponentially,which could lead to an explosion of information.To manage such a large amount of text,the text classification problem needs to be solved urgently.Secondly,text classification based on naive bayes is based on conditional independence assumption,which is inconsistent with reality.Among many suggestions to improve its accuracy by weakening feature independence assumption,the feature weighting approach has received less attention from researchers.Moreover,all of the existing feature weighting approaches only incorporate the learned feature weights into the formula of naive bayes and do not incorporate the learned feature weights into its conditional probability formula at all.Therefore,from the perspective of feature weighting,this paper proposes a bayesian algorithm based on term frequency-inverse document frequency feature weight and rank factor feature weight,and applies it to Chinese text classification,which can effectively manage huge and complex data,assist people to find information quickly and save time cost.The main research contents of this paper are as follows:(1)The naive bayes,KNN and support vector machine are compared in text classification.Through research and experiments,the results show that naive bayes algorithm is the best algorithm for Chinese text classification.(2)This paper proposes a naive bayes algorithm based on term frequency-inverse document frequency feature weight and rank factor feature weight—feature weighting naive bayes algorithm.This algorithm combines term frequency-inverse document frequency into the conditional probability formula of bayes,and then imports the rank factor feature weight determined by term frequency-inverse document frequency into bayesian formula,which can greatly weaken the influence of its feature independence assumption.(3)In this paper,the feature weighting naive bayes algorithm is applied to Chinese text classification.Due to the complexity of various corpuses on the network,there is no corpus that can be used consistently for Chinese text categorization so far,so this paper constructs a Chinese text corpus according to the screening rules.Experiments show that the accuracy of the feature weighting naive bayes algorithm in text classification is higher than that of the standard naive bayes algorithm,which proves that the proposed new algorithm is a more effective and accurate text information classification algorithm.

Keywords/Search Tags:

naive bayes, feature weighting, Chinese text classification, term frequency-inverse document frequency

PDF Full Text Request

Related items

1	Text Classification Algorithm Research Based On Naive Bayes
2	Research On Text Classification Algorithm Based On Naive Bayes Method
3	Research On Text Classification Based On Deep Learning
4	Event Trigger Recognition Based On Positive And Negative Weighting And Its Application
5	Term weighting revisited
6	Research On Feature Selection Algorithm Based On Segmented Term Frequency In Text Classification
7	Research And Application Of Feature Selection Based On Term Frequency Reordering Of Document Level
8	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
9	Improvement Of Navies Bayes Text Classification Algorithm Based On Unbalanced Dataset
10	Research On Text Classification Algorithms Based On Machine Learning