Font Size: a A A

Research On Data Mining Using Bayesian Network

Posted on:2009-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2178360242985969Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bayesian Network is the combination of Bayesian theory and the graph theory. Because it is strict and consistent in theory, and also due to its effective local computation mechanism and visualized knowledge representation, Bayesian Network has attracted most attention of researchers from the AI field. In this paper, Bayesian Network was applied to deal with data from agricultural domain, more specifically, to predict the milk output of a certain farm after having studied its history data. The whole procedure of data mining using Bayesian Network is completely done, and subsequently we made a comparison between the results generated by Bayesian Network and multinomial linear regression. The main work and innovations of this paper are as follows:1.The concepts and related techniques of data mining are briefly introduced. The basic principles and methods of Bayesian Network are described with details.2. In the data pre-processing stage, a variation of Chi2 algorithm put forward by the author in March, 2007 is employed. The algorithm discretizes the predicting variables without sacrificing the fidelity of the training data, hence makes it convenient to use Bayesian Network method.3. In the stage of structure searching, applied is the greedy algorithm with heuristic rules and random restart mechanism. This method takes full advantage of the domain knowledge and the semantics of the variables to work out five heuristic rules. In this way, the search space is dramatically reduced. The greedy algorithm with random restart mechanism remains the merit of simplicity and overcomes the shortcoming of probably being trapped in the local optimality, therefore gains as good results as most of the intelligent searching algorithms.4. After we getting the discrete-valued result through Bayesian Network inference, the question of how to transform the discrete values into continuous ones is also considered in order to improve the prediction accuracy, instead of straightly using the median of corresponding interval.In addition, when the results generated by the two methods are brought to make a comparison in a visualized way, a new order produced by sorting the original data is adopted to avoid the chaos in the scatter figure. So, the better performance of Bayesian Network with respect to this problem is much clearer.
Keywords/Search Tags:Data Mining, Bayesian Network, A variation of Chi2, Greedy Algorithm, Linear Regression
PDF Full Text Request
Related items