Font Size: a A A

Research On Bayesian Networks-Based Text Classification Algorithms

Posted on:2017-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:S S WangFull Text:PDF
GTID:2308330491455327Subject:Engineering
Abstract/Summary:PDF Full Text Request
Classification is one of the most important task in data mining, has been widely applied in the real world. Automatic text classification is one of the Text Retrieval tasks, which classify a text into one or more defined classes based on its content. Meanwhile, with the rapid development of Internet, obtaining the information we want from the large text data or predicting a given text have become a hot topic. At present, the commonly used classification models include Bayesian Network, Decision Tree, Support Vector Machine, Artificial Neural Network, and so on. Naive Bayes is the most simple and effective model of Bayesian Network, which ignores the interdependence between each attributes and assumes that the attributes are independent. As the excellent performance in speed and accuracy on classification, naive Bayes has been applied to many fields, in which the Text Classification is a typical application.Compared with the traditional data mining, the data text classification facing is high-dimensional, continuous and sparse. Naive Bayesian model is widely used in Text Classification as its simplicity, efficiency and easy to understand. Now, there has been a large number of text classification methods based on naive Bayes model, the Bernoulli naive Bayes model (BNB) is the first one to extend naive Bayes model to the field of Text Categorization. Based on this breakthrough, the Multinomial naive Bayes model (MNB), the Complement of Multinomial naive Bayes model (CNB) and the One-Versus-All-but-one model (OVA) are proposed.With the improvement of the naive Bayes model, there are many improvements of text classifiers which based on naive Bayes are proposed too. The way of extending model from naive Bayes improvements to Text Classification makes the large-scale text mining get unprecedented development. In order to weaken the attribute independent assumption of naive Bayes, there are three mainly aspects to improve naive Bayes model:manipulating attributes, manipulating instance, and structure extension. In this paper, we use the multinomial naive Bayes model as the basic object and study various improvements of the naive Bayes text classification methods. According to above three questions, this paper study from three aspects respectively. We propose a CFS-based feature weighting approach to naive Bayes Text Classifiers, adapt naive Bayes Tree to Text Classification and give a structure extended multinomial naive Bayes to Text Classification.The main contribution of this paper including:1) This thesis gives the details of the naive Bayes model and states its problems and solutions, review the existing classical naive Bayes text classification methods and show several typical improvements of the naive Bayes text classification methods detailedly.2) This thesis gives the learning algorithm framework of the attribute weighted naive Bayes text classifiers and review the improvements of naive Bayes text classifiers on attribute weighting,, propose a CFS-based feature weighting approach to naive Bayes Text Classifiers in which the weights of the selected attributes are increased and the multinomial naive Bayes text classifiers are improved from this way. In this approach, we not only consider attribute weights information into classification, but also take the attribute weights information into account when building model for the first time.3) This thesis gives the learning algorithm framework of the naive Bayes tree text classifiers, propose a naive Bayes tree text classification method which extends the classical NBTree model to text classification field, solves several main problems when model extending and apply the tree model to the text classification field for the first time. In order to further improve this model, this approach combines the multiclass learning technique with the naive Bayes tree text classifier and give its multiclass version.4) This thesis gives the learning algorithm framework of structure extension to the naive Bayes text classifiers, review the improvements of naive Bayes on structure extension, propose a structure extended multinomial naive Bayes model and which is inspired by the classical structure extension algorithm of naive Bayes model called AODE and its weighted improved version WAODE. In this approach, we apply the structure extension to text classification field for the first time and transform the space consuming into relatively small time consumption ingeniously meanwhile.
Keywords/Search Tags:Text Classification, Naive Bayes, Attribute Weighting, Decision Tree, Structure Extension
PDF Full Text Request
Related items