Research On Theory And Application Of Thompson Sampling In Sequential Decision-Making Problems

Posted on:2020-05-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Y Zhu

Full Text:PDF

GTID:1360330578981651

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Many real-world applications can be modelled as sequential decision-making prob-lem.In a sequential decision-making problem,the learning agent interacts with an un-known and noisy environment to maximize the cumulative reward.Thompson sam-pling is one of the most promising strategies to tackle this problem.It utilizes Bayesian strategy to balance the exploration-exploitation tradeoff.Despite being empirically ef-ficient and powerful,Thompson sampling has eluded theoretical understanding.How the information is shared across different strategies is not fully understood and why Thompson Sampling is effective to balance the exploration-exploitation tradeoff is not thoroughly studied.Moreover,in many practical domains,the stochastic and station-ary assumptions about the environment are often violated.These drawbacks hinder the practical deployment of Thompson sampling in highly dynamic and large-scale do-mains.In this thesis,we extend current researches and focus on the theoretical guaran-tee of Thompson sampling and its practical deployment.The details are illustrated as follows:1.By introducing a martingale-based analysis technique,we demonstrate that the frequentist regret upper bound of linear Thompson sampling matches its Bayesian regret bound and the information-theoretic lower bound for this problem.Our analysis precisely qualifies the exploration-exploitation tradeoff and partially re-veals how the information is transferred between different actions in linear set-tings.We also conduct empirical experiments to demonstrate the assumptions and the lemmas.2.By decomposing the instantaneous regret and utilizing matrix martingale theory,we prove that the expected regret upper bound of linear combinatorial cascading Thompson sampling matches the regret bounds for state-of-the-art UCB-based algorithms.Our analysis reveals that linear combinatorial cascading Thompson sampling explores every dimension of feature space and exploits current informa-tion efficiently.We also experiment on two real-world datasets to demonstrate the advantage of our algorithm over existing works.3.We incorporate collaborative effects into Thompson sampling so that the algo-rithm can capture real-time changes of the environment and adjust decision-making strategy accordingly.Our algorithm shows accelerated convergence and improved prediction performance over standard Thompson sampling in collaborative envi-ronments.We also provide regret analyses in both contextual and non-contextual settings.Our theoretical analyses prove that collaborative Thompson sampling outperforms standard Thompson sampling in the collaborative environments.4.We design and analyze collaborative contextual combinatorial cascading Thomp-son sampling to address the sequential decision-making problem in a collabora-tive environment.By analyzing the change of variance of the reward estimator,we prove that the algorithm can efficiently explore the unknown cluster structure in the environment.The experiments conducted on synthetic dataset demonstrate the advantage of incorporating online clustering.

Keywords/Search Tags:

Sequential Decision Making, Exploration-Exploitation Tradeoff, Thompson Sampling, Stochastic Model, Contextual Model

PDF Full Text Request

Related items

1	The Study On Optimization Method For Sequential Decision-making In Supply Chain
2	Portfolio Model Construction And Risk Measurement Based On Adaptive Multi-armED Bandit Algorithm
3	Study On A Sequential Modeling Of Pairwise Comparison Process And Decision Making Methods
4	Sequential Decision Models Based On Q-Learning Algorithm
5	Stochastic DEA Model And Its Application In Anti-HIV Immunotherapy
6	A Kind Of Sequential Decision Method Based On MCTS With Its Application In The Acupoints' Ranking Schema
7	A Class Of Multi-objective Decision-making Analysis With True And False Function
8	Research On Sequential Multi-class Three-way Decision Model
9	Research On Multi-attribute Decision Making Based On Cloud Model
10	Weapons Test Analysis Methods Based On Bayes Sequential Decision-making