Font Size: a A A

Improved Drift Detection Framework For Piecewise-Stationary Multi-Armed Bandit Problem

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z C PengFull Text:PDF
GTID:2427330620968095Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the study of sequential decision-making,multi-armed bandit decision algorithm is one of the commonly used methods,which aims to make the best choice under the condition of minimizing the cumulative regret.In the uneven data flow,potential offsets increase the cost of exploring decision plan.Therefore,this thesis mainly studies how to effectively detect offsets in non-stationary scenarios and reduce their unintended impact on decision results,so as to control the upper bound of cumulative regret.The previous studies describe the statistical characteristics of sequential data in non-stationary scenarios,such as piecewise constant mean.But in real scene,the change of data distribution is not just that.This article discusses two new changes for the first time(1)continuous and piecewise linear mean(2)piecewise constant variance and piecewise constant mean.In this paper,the adaptive conceptual offset detection task is mainly accomplished by using the definition method and convergence properties of model evaluation and selection in statistical learning theory,then a new offset detection algorithm is proposed.This method calculates the sample loss(sigmoid loss or piecewise linear loss)based on the SVM estimator,and use the loss to test the trend offset between the local data,thereby achieving a better cumulative regret.In addition,considering the existence of outliers,many traditional detection methods of identifying change points are ineffective or even fail.Often they will infer additional changepoints to fit the outliers.To overcome this problem,this thesis focus on distinguishing the difference between outlier and change point(or trend drift).This thesis proposes two new algorithm based on truncated loss function design and process stability inspection,which can achieve nearly optimal regret bounds.In order to further illustrate the improvement of the above methods,this thesis explains the reasons for these results by false positive rate and the detection delay.We compare some algorithms proposed in this thesis with the state of-the-art algorithms in simulation experiments and real data scenarios to demonstrate their excellent performance.
Keywords/Search Tags:Piecewise-stationary, Error rate, Outlier, Truncated loss function, Offset trajectory recognition
PDF Full Text Request
Related items