Font Size: a A A

Nouiehed-Ross’s Conjecture,Berry’s Conjecture For Two-Armed Bandit And Their Applications

Posted on:2024-07-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C ZhangFull Text:PDF
GTID:1528306917495364Subject:Financial mathematics and financial engineering
Abstract/Summary:PDF Full Text Request
This thesis consists of three parts.The first part and the second part study the optimal strategy of the two-armed bandit model,and the third part studies the properties of an important family of local martingales:Ocone martingales.The bandit problem,also known as sequential design,or experimental design,is an important class of statistical decision-making problems that has received a lot of attention for nearly 90 years since its inception,and has been applied in many different fields.In biostatistics,such questions are called randomized controlled trials,in social science research as between-group designs,and in digital marketing as A/B tests.This paper extends the two-armed bandit model proposed by Feldman,and studies the optimal strategy on this extended model.Let(F1,F2)be a pair of distributions on the probability space(Ω,F,P).The distribution of the returns of experiment X and experiment Y(also known as X-arm and Y-arm)is unknown,but the following two hypotheses H1 and H2 can be made:where ξ0 is the priori probability that Hi is true.In trial i,either X-arm or Y-arm is selected to generate a random variable Xi or Yi which describes the payoff,and ξi is the posterior probability of Hi being true after i trials.The aim is to find the optimal strategy that maximizes the total expected utilities.Let the utility function φp(s)represent the utility of the player holding funds s,and the player has initial funds x before starting to play the bandit machine.This paper studies the myopic strategy Mn:in trial i,play X-arm if ξi-1≥1/2,or Y-arm otherwise.The difficulties of this study are as follows:firstly,we take Fi and F2 as general distribution functions,continuous or not,rather than Bernoulli distributions;secondly,we consider general utility functions which are no longer linear.This makes Feldman’s proof method invalid.Using new techniques,this thesis successfully solved the above-mentioned difficulties,making the Feldman’s model greatly expanded.In this thesis,by using the dynamic programming method,it is proved that the myopic strategy Mn maximizes the expected utility of n trials if and only if the utility function φ and the distributions F1 and F2 satisfy Condition(0.0.4)means that no matter how much money the player already has,if only one trial is to be played,playing the arm with distribution F1 is always better than playing the arm with F2.In the case that φ(x)=x,(0.0.4)coincides with the condition proposed by Feldman.If we choose the utility function in(0.0.4)as an indictor function φ(s)=I[k,+∞)(s),and initial fund x=0,we could prove the Nouiehed and Ross’s conjecture for two-armed case that myopic strategy also maximizes the probability that not less than k wins occur in the first n trials,for all k,n.In the second part,this thesis studies another important two-armed bandit model proposed by Berry.The thesis studies an independent Bernoulli two-armed bandit with unknown parameters ρ and λ,where ρ and λ have a pair of priori distributions such that dR(ρ)=CRρτ0(1-ρ)r’0dμ(ρ),dL(λ)=CLλl0(1-λ)l’0dμ(λ)and μ is an arbitrary positive measure on[0,1].Berry proposed the conjecture that,given a pair of priori distributions(R,L)of parameters p and A,the arm with R is the current optimal choice if r0+r’0<l0+l’0 and the expectation of p is not less than that of A.This thesis gives an easily verifiable equivalent form of Berry’s conjecture and use it to prove that Berry’s conjecture holds when R and L are two-point distributions as well as when R and L are beta distributions and the number of trials N≤(?)+1.The third part of this thesis mainly studies Ocone martingales and the properties of stochastic integrals w.r.t Ocone martingales.For a divergent local martingale M,and its quadratic variation process<M>,according to the famous DDS theorem,we know that if M0=0,there is a Brownian motion,β,so that the martingale M can be expressed as the Brownian motion βtime-changed by(M),that is,Mt=β<M>t.The so-called Ocone martingale refers to the martingale whose DDS Brownian motion β and the quadratic variation process(M)are independent of each other.Ocone martingale has many properties similar to Brownian motion and Gaussian martingale,so some theorems originally established for Brownian motion can be extended to Ocone martingale.Therefore,the research on the properties and characterization methods of the Ocone martingale has always been paid close attention to.According to the theory of stochastic integration,we know that under suitable conditions,the stochastic integration of a stochastic process with respect to a local martingale is still a local martingale.This paper mainly studies whether the local martingale which is a stochastic integration w.r.t.an Ocone martingale is a new Ocone martingale.This thesis gives two main results that answer the above question.Theorem 4.3.2 gives a sufficient condition for a stochastic integral w.r.t.an Ocone martingale being a new Ocone martingale.Theorem 4.3.3 can be regarded as the inverse problem of the above question,which gives a necessary condition for making a stochastic integral with respect to a local martingale to be an Ocone martingale in special cases.And it extends Vostrikova and Yor’s Theorem 4.1.2 to a class of more general integrands.Using these theorems,we can check whether a more complicated stochastic integral is an Ocone martingale,and we can also construct more interesting examples of Ocone martingales.The structure of this thesis is as follows.Chapter 1 introduces the origin,development,applications,main research methods and famous strategies of two-armed bandit problems.Chapter 2 introduces the extended Feldman two-armed bandit model,establishes the dynamic programming principle under this model,and obtains the necessary and sufficient condition for the myopic strategy to become the optimal strategy,which solves the conjecture proposed by Nouiehed and Ross.Chapter 3 introduces the two-armed bandit model from Berry and Berry’s conjecture,proposes an equivalent form of Berry’s conjecture,and confirms Berry’s conjecture in two special cases.In Chapter 4,the origin and researches of the Ocone martingale are introduced,and two important results about the transfer of the Ocone property along the integral transformation are obtained.According to these results,two complex Ocone martingale examples are constructed,showing that the results of this paper are the substantial extension of the results of Vostrikova and Yor.Chapter 5 gives a summary of the thesis,and talks about the work that can be examined in the future.
Keywords/Search Tags:two-armed bandit, optimal strategy, myopic strategy, dynamic programming prop-erty, Berry’s conjecture, Ocone martingale, integral transformation
PDF Full Text Request
Related items