Font Size: a A A

Research On Naive Bayes Classification With Weighted Citation Features

Posted on:2017-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:H BaiFull Text:PDF
GTID:2308330485971055Subject:Library and Information Science
Abstract/Summary:
Text classification refer to the procedure assigning documents into a category with some criterions learned by the training set and labeling it. With automatic text Classification, we can have an effective information management and provide more convenient information service. Naive Bayes Classification can classify efficiently with a high accuracy. And it can easily make an incremental change. Thus it is popular in information service systems based on automatic text Classification.There’re many citations in academic literatures, the network based on them can provide many information out of the original literatures. Therefore it can improve the performance of classifiers.In this article, I combined citations with Naive Bayes Classification, there’re my main work:First, systematic introduced the general process of Text classification. Then I analyzed the different model of Naive Bayes classification in detail and point out that Naive Bayes is sensitive to noisy feature, therefore feature selection is important to Naive Bayes. Based on it, I reviewed improvement on Naive Bayes, which is called semi-Naive Bayes. Its advantages and weakness are also concluded. It can relax the conditional independence but will increase the time complexity. We should choose it dependent on the situation. There’re three ways to improve the classifier in semi-Naive Bayes:z-dependence classifier, effective feature subsets and weighted Naive Bayes. Afterwards, I also reviewed the network data classify, in which the citation network are included.Then, through the review of network data, I proposed different ways to improve the Naive Bayes, using citation network, terms in citation and weighted text fields. Besides, there’re some citations are cited more than one times in one article, I used it to weight citation and get a good results.These are the conclusion in my article:(1)Citation can be helpful in text classifying. Independently constructing a classify model through citation is not effective. But combining citation with main text can be inspiring, especially for combing the terms in citations, it can improve the precision of the classifier.(2)giving different weight for titles, abstract, main text and citation titles(4:2:1:2) to improve feature selections is possible,both for precision and recall.(3)Citation can make more progress when features are in small counts. Multivariate Bernoulli model perform well in citation field, compared to Bayes Network and Multinomial Bayes model.(4)In most conditions, the CTNB classifier perform better according to the accuracy, and the WNB classifier improvement in recall is higher. According to the f-measure, CTNB is a better classifier.(5) Citation performs differently in different categories. Using citations can get a high precision for the category in which the literatures are similarer. While using terms in citation perform well in general.
Keywords/Search Tags:Naive Bayes, Text classification, Weighted Citation, Weighted Text Field
Related items