Font Size: a A A

Research On Harmful Information Recognition Based On A Bayesian Network Classification Algorithm

Posted on:2020-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:P DingFull Text:PDF
GTID:2428330599460192Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The essence of harmful information recognition in text form is text categorization.Spam filtering and network public opinion analysis are considered as binary classification of short text.The sparseness of Chinese texts leads to the problem of high-dimensional features.The feature of Bayesian classification model is incomplete and the assumption of conditional independence is absent.The above shortcomings have become important factors restricting short text categorization.Combining with the actual situation of spam filtering and network public opinion analysis,this paper makes two improvements on feature extraction and structure learning algorithm.Firstly,a term frequency-inverse document frequency algorithm based on headword expansion is proposed.The algorithm combines the structural characteristics of the model to solve the defects of high dimensionality of the feature.The algorithm increases feature diversity and achieves feature dimensionality reduction.Secondly,the genetic algorithm and gray wolf optimization are mixed into a gray wolf optimization-genetic algorithm.The algorithm is used to solve the problem that the feature incompleteness and the assumption of conditional independence is absent.A three-layer model is used to avoid feature incompleteness.The gray wolf optimization-genetic algorithm relaxes the conditional independence assumption between attributes of the classifier model.Finally,the two improved algorithms are applied to spam filtering and network public opinion analysis.Through experimental analysis,it is proved that two improved algorithms and three-layer Bayesian structure model are feasible.The classifier based on the proposed algorithm can improve classification performance.Based on this,a harmful information recognition software based on Bayesian network classifier is designed.
Keywords/Search Tags:harmful information recognition, text categorization, bayesian classifier, feature extraction, genetic algorithm
PDF Full Text Request
Related items