Font Size: a A A

Research On Hybrid Classification Based On Navie Bayes And Decision Tree

Posted on:2017-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:D M LiFull Text:PDF
GTID:2308330482979884Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data is a kind of valuable source all the time. Through the useful information detected from data, the economic benefit could be created in many fields. Data mining is an approach to mine the useful information from low signal to noise ratio data. And classification is one of the most important hotspots in data mining research. There exist numbers of classifica-tion approach, such as naive bayes, decision tree and support vector machine. Naive bayes and decision tree are widely used in many fields for its’ simple, fast and the robust result. For example, the spelling check of Google is based on naive bayes. In this paper, a novel approach is proposed which includes:(1)Even though the result naive bayes is good, but there are still two weaknesses for naive bayes in some special case, the condition of dependence between any two attributes and the rough process of probability estimate. In this paper, an optimization function is proposed for naive bayes to improve performance for the rough process. This approach is called R-NBC. In order avoid the underflow and overfitting, the optimization function consider the normal attributes and the attributes whose conditional probability is zero. The performance are tested via the UCI datasets, whose results show that the classification accuracy are improved by R-NBC compared with traditional approach, especially for high dimensional data.(2)For multi class labels classification problem, an approach combined with naive bayes and decision tree is proposed in this paper. Because of the noise in instances, the accuracy of decision tree may decrease while overfitting. So, before to analysis the data via decision tree, R-NBC is applied in training dataset in order to remove the noise instances. The results of mixture approach of decision tree and R-NBC are shown that the proposed method is effective for the challenging multilevel classification problem. In fact, it could also help us to extract the importance attributes and high signal noise ratio datasets from the noised dataset with high dimension attributes.(3)Furthermore, in the end of this paper we conclude a GUI for the algorithm which could analysis the data to show the classification results directly on the screen.
Keywords/Search Tags:Classification Algorithm, Naive Bayes, Probability Estimate, Decision Tree
PDF Full Text Request
Related items