Font Size: a A A

Dynamic Bayesian Network Model Based On Analysis Of Samples

Posted on:2009-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhouFull Text:PDF
GTID:2178360242981579Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining, as a new technologyrelated to manysubjects, brings andgrows along with the large number of accumulated data, as well as theurgent needs of information and knowledge for the competition in themarket. It is related to statistics, artificial intelligence, database, machinelearning and more fields. Its main task is to mine the useful informationcontained in the data and then find the uncovered knowledge, and providesvaluable decision-making basis for commercial competition, enterpriseproduction and management, government department's decision-makingandscientificexploration.Classification is an important technology in data mining, which is aprocess that constructs a classifier based on the characteristics of data setsand classifies the samples unknown category with the classifier. Usually,theclassifierisconstructedinstatisticalmethods,machinelearningmethods,neural network methods, and so on. Bayesian network, which is animportant classification technology, is actually an uncertainty reasoningmodel based on probability theory, and is a qualitative description andexpression of the problem areas using network structure, and a quantitativedescription of the problem using probabilityparameters. It is a combinationof the powerful inference functions of Bayesian theory and conciseexpression of graph theory model, and provides a new thought for theanalysis and processing to uncertainty. It considers priori information andsample data, and makes use of expert knowledge and experience, and cangive qualitative and quantitative analysis. Integrating subjective andobjective organically avoids the over-fitting of the data, as well as theprejudicecausedbysubjectivefactors.Classical Bayesian network models include: Naive Bayesian networkclassification model, Tree Augmented Naive Bayesian classification model,as well as Unrestricted Bayesian classification model. These traditionalBayesian network classification models depend on the relationship amongthe attributes. One training sample set only constructs one classificationmodel, but sometimes the attribute-dependent relationship is different fordifferent values of attributes, therefore, the classification model can not accurately express the dependent relationship, affecting the classificationresults. Considering this classification problem, this paper proposes a newBayesiannetworksalgorithm,anddoesthetestandanalyzesthetestresults.In chapter one, firstly the origin of data mining is reviewed and thedefinition, the process and the application and development trends of datamining is introduced, then the common classification models arepresented, which include Decision Tree, Rough Set, Genetic Algorithm,NeuralNetworks,BayesianLearningandsoon.In chapter two, the Bayesian theory and the concept of informationentropy and mutual information are described, and the construction processof Naive Bayesian classification model, Tree Augmented BayesianClassification Model, and Unrestricted Bayesian network model areanalyzed,whicharebasedontheBayesiantheory.In chapter three, firstly the minimum description length (MDL) criteriaprincipleisresearched.Acodingalgorithmaboutclassificationmodelbasedon MDL is raised. A classification model assessment algorithm is raised,which get preparations for the assessment of the attribute reductionalgorithm and classification model algorithm. Secondly according to thejoint mutual information in information theory, specific joint mutualinformation(SJMI)formulaisreasoned,whichis usedtoattributereduction.Attribute reduction process is attribute selection process. In the process, theSJMIbetweenallattributesandthecategoryattributeiscalculated,andthenthe attribute with the greatest SJMI is selected and is added to the selectedattribute set. The data set is narrowed accordingto the selected attribute set,which is used to calculate the SJMI between each unselected attribute andthe selected attribute set and category attribute, then the attribute with thegreatest SJMI is selected and is added to the selected attribute set. Theabove process is repeated until the narrowed data set belong to the samecategory, or the number of the selected attributes is equal to Specific value.Then the selected attribute set is used for TAN modeling. Finally attributereduction classification algorithm is tested on adult and weather data setsrespectively. The classification model assessment algorithm raised above isusedtoassessment.The comparison inperformancebetweenthetest resultsand the ID3 decision tree in weka3-4 is given. The classification, based ontheattributereductionalgorithm,hasbeenachievedgoodresults. In chapter four, the TAN modeling process is introduced in detail andDynamic Tree Augmented Naive Bayesian classification model (DTAN) isestablished. The TAN modeling process includes: the establishment ofmaximum weight tree, the orientation of the tree, the establishment of theCPT for each node and the calculation of the classification correct rate.TAN model uses the attribute reduction algorithm based on SJMI, then usethe reduced attribute set for making TAN model based SJMI. DifferentTAN model is established by deferent test sample, which is called DTAN.Finally, the classification performance of DTAN is assessed by theassessment algorithm based on MDL in UCI data sets. Comparing to TANclassificationmodel,althoughDTANhasahighercomplexity,ithashigherclassification accurate rate and smaller description length, which is aclassificationmodelworthtofurtherresearch.Inchapterfive,asummaryofthefulltextandProspectisgiven.Traditional Bayesian network represents the dependence relationshipamong the attributes. Bayesian network is established for the entire samplesets. However, when the test samples are changed, one Bayesian networkhas been unable to correctlyrepresent the dependent relationship amongtheattributes, which will lead to misclassification rate increase. The testsamples are tested by the inter-dependence relationship involving all theattributes, and sometimes some of the dependent relationship betweenattributes may be not necessary, which increases the model descriptionlength. DTAN classification model establishes different TAN for each testsample, strengthens the relevance, improves the accuracy of theclassification algorithm, after attribute reduction, simplifies the TAN modelandalsoreducesthemodeldescriptionlengthofthemodel.Bayesian network classification study involves a very wide range (suchas probability theory, information theory, machine learning, and so on).There are still many issues to be studied further. This paper only considersTAN model using attribute reduction based on the specific joint mutualinformation, which behaves well in classification, and the further researchand experiment on other Bayesian networks is necessary. DTAN modelonly considers the data set with discrete attributes. For the data set withcontinuousattribute,discretizationisnecessary,whicheasilycausesthelossof information. It is worth considering that how to study Bayesian networkclassificationmodelwithdatasetwithcontinuousattributesdirectly.
Keywords/Search Tags:Bayesian
PDF Full Text Request
Related items