Font Size: a A A

Research Of Reinforcement Learning With Value Function On Direct Marketing

Posted on:2019-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:P C LiFull Text:PDF
GTID:2428330542497978Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Direct Marketing is a marketing model in which customers can directly respond to companies.As a long-term business activity of companies,Direct Marketing exists in the entire process of companies' development.Therefore,the Long-Term Value(LTV)is usually considered as an indicator to evaluate the marketing effectiveness.In recent years,with the rapid development of intelligence,more and more companies hope to use the technology of Machine Learning(ML)to make marketing decisions.However,the traditional Supervised Learning(SL)and Unsupervised Learning(UL)methods have great limitations when dealing with this problem.The reason behind that is these meth-ods can only maximize the immediate profits of a single decision whereas in Direct Marketing sequences of decisions need to be made over time.Reinforcement Learning(RL)is an important part of Machine Learning and mainly used to solve sequential decision problems.In the process of RL,the Agent continu-ously interacts with the environment and learns the mapping relationship between states and actions from delayed rewards of the environment to maximize the cumulative dis-counted reward.Considering that the process of Direct Marketing is also a sequential decision-making process,and its goal of long-term profit maximization coincides with the goal of maximizing the cumulative discounted reward for Reinforcement Learning.Therefore,there is a great advantage to use Reinforcement Learning methods to solve the Direct Marketing decision problems,which is the starting point of this research.In addition,in order to effectively meet the actual requirements,this research starts with Reinforcement Learning methods based on value function.This research aims to solve three problems,the time intervals between decision-making points in Direct Marketing scenarios are not fixed,the large-scale data limits the learning speed and the customer state is partially observable.Then we propose corresponding improvement methods and use the simulation to evaluate these methods.Details as follows:On the one hand,considering the problems that the time intervals between decision-making points in Direct Marketing scenarios are not fixed and large-scale data limits the learning speed of the model.This research proposes the improved Q-learning algo-rithms based on the classical Q-learning algorithm.Specifically,a mean normalization method is used to reduce the noise impact on the reward signal caused by the different time intervals among decision points.Then a standardization factor is constructed for Q-learning,and its update method follows the updated method of the value function,thus reducing the deviation of Q value function caused by the unsynchronized update of time interval during the iteration process,thus proposing the Interval-Q algorithm.Besides,Concerning the slow speed training problem for the Interval-Q algorithm when faced with large-scale data,this research introduces the Temporal Difference TD bias to Q sampling method and proposes the Q sampling method based on the TD bias.Finally,simulation experiments show that the Interval-Q algorithm proposed in this re-search can achieve higher profits in the irregular Direct Marketing.In addition,the Q sampling method based on TD bias can achieve better results while reducing the number of samples.On the other hand,aiming at the problems that the traditional Reinforcement Learn-ing algorithm cannot effectively deal with the partially observable customer state in Di-rect Marketing.This research studies the DQN(Deep Q Network)model and proposes the improved DQN model based on two networks.Specifically,considering the tem-porality of marketing,the problem mentioned above is solved by using the DQN model based on the RNN network(termed as DQN_RNN)to learn the hidden state.Then,the research points out that it is a challenge for DQN_RNN model to learn the hidden state and approximate the value function in the optimization process of only one net-work.Therefore,the DQN model based on two networks is proposed referred to the idea of hybrid model:the hidden state is learned from the supervision data through the RNN.After that,the hidden state of the output of the RNN network is used as the input state of the DQN network for Reinforcement Learning.In this way,the advantages of the two networks can be fully utilized,and the hidden state can be better learned while improving the approximation effect of the value function.Finally,in order to obtain a better strategy,this research proposes three improved models by analyzing the network structure and training method.They are the separate training model with two networks the one-step joint training model with two networks and the two-step joint training model with two networks.Simulation experiments show that the DQN model based on the two networks proposed in this paper can achieve higher returns in regular Direct Marketing.
Keywords/Search Tags:Reinforcement Learning, Value function, Q-learning algorithm, Deep Q Network, Direct Marketing
PDF Full Text Request
Related items