Font Size: a A A

Studying On Prior Distribution In Bayesian Learning

Posted on:2002-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuFull Text:PDF
GTID:2168360032957211Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Bayesian method originated from the well-known Bayes' theorem and had beendeveloped into a syStematic technology of inference and decision. In 90's of the 20stcentUry leamable Bayesian netWork was studied and was aPplied tO machine learning.With the boom of date mining, Bayesian netWork has been paid great attention and againin the limeligh< because of the naturai relation betWeen the mathematical statistics anddata mining. Comparing with non-Bnyain methods, it's prominent featares lay in that itcombines the prior and POsterior information, which avoids the disadVantag ofsubjective bias caused by simply using the prior information only, of blind search causedby the incomplete sample information, of noise affection caused by simply using thesample information only If we choice a suitable prioF, we can conduct the Bayesianleaming effectively, so it fits the problems of data mining and machine leaming thatpossess charaters of probability and statistics, especially when the samples are rare.What is much difficult in Bayesian method is that the detenninations of prior are onlysome guidelines without complete and operationaI theorem, and it's hard to value thejustice and accuracy of a prior in manyfconditions. In this essay we fOcus on the choosingof suitable prior in Bayesian learningFirst of a1l. we discuss afoundaIional probiem -consistency in 8a)esian learning. Aproperty of consistency is attained, i.e. inference based on consistent leaming is free fromprior. By introducing counterexamples we investigate the existence of inconsistency'which reminds us to carefully choose a priof, proPOse a basic principle in Bayesianleaming - the consist principle. We a1so point out that under certain regular conditionsthe posterior distribution is free from prior and is approkimately normal as the volume ofsamples increases infinitely In this essay we discuss these regular conditions using amethod similar to that of W8lker (l967[2l ]) and Heyde and Johnstone (l979tl4]), but theconditions have been simplified. The followings are some resultS about this aspect:Definition l.l: The pair (8,p.) is said consistent at 00, if for everyneighborhood v of 96, limp'(v) = l almost surely. 1.e. given any n >o and an--.roarbitrary neighborhood V of 90, there exists n 2 n0 suchK(8 E V I X'"') Z l -- 0 for a1l n Z nolLemma 1.l: Let 90 be an interior POint of e and Kl,K2 be tWO priordensities which are positive and continuous at 00. We assume that the POSterior4(0 I X') is consistent for i = l,2 thenLT jK, (8 I X(")) -- K,(6 I X'">)k6 = 0Theorem 1.l: if K and Vn are poSterior distributions of 8 and they areconsistent then lfu(m)--+ Jfd(vn) for al1 bounded continuous functions f.nweSuppose that 00is an interior point of e. W6 imPOse the following regulartyconditions throughout:(C l ) The prior density K(0) is continuous and positive at 80.(C2) logf(x I 0) is twice differentiable with respect to 0 in some neighborhoodof 80 and the twice difference E(ap) is continuous in O.b0, ) is continuous inO.(C3) For any 8 > 0 for which N0(8) = {i 8 --00 l< 6} = O there exists apositive number k(5), depend on 5. such thatlimP[ sup n-'{L,,(6) -- Ln(00)} < --k(6)] = lnwt eee--M,(6)this obviously implies that 8. is a weakly consistent, namely thatp lim 6n == 00.n-roTheorem 2.l: suPpose the conditions (C l ) to (C3) hold, then, if -- co < b < a < co,the POsterior probability that 6n + bcr. < 6 < 0n + arr.' namelyI::Kn (8 l X,..., X.)d0t6nds tO(2.)-1 f.4"duin probability as n - co.Secondly' we regard some guidelines of choosing prior (such as conjugate prior,Jetws non-infOrmative Prior, maximum entropy prior) as heuriStic aPproaches. Byintroducing the Bayesian hetalstic approaches and combining the prior infOrmation andsample infOrmation, we Stody the choosing of suitable prior in view of oPtimiZaion. wnthe loss function and risk function we aPpraise the justice and accuracy of a prio...
Keywords/Search Tags:Mackine learning, Bayesian method, prior distribution, posterior, distribution
PDF Full Text Request
Related items