Font Size: a A A

Dynamic Pricing In Electronic Retail Markets By Reinforcement Learning

Posted on:2010-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:J T WangFull Text:PDF
GTID:2189360275477552Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, Electronic Commerce (EC) has applied widely. Therefore, dynamic pricing problems in the electronic retail markets are deserved to research. In this paper, we use reinforcement learning (RL) as a tool to study the problems of dynamic pricing in electronic retail markets consisting of a seller and two competing sellers.At first, we model the monopoly electronic retail market model as a Semi-Markov decision process (SMDP). Combined with the concept of performance potential, we use the Q-learning algorithm and the simulated annealing Q-learning algorithm (SA-Q), which are applicable to both average- and discounted-reward criteria, to solve dynamic pricing problem under the monopoly market. The simulation results of a numerical example show that both proposed algorithms are effective compared with one of simple dynamic pricing algorithms, i.e., DF. And the SA-Q balances the relationship of exploration and exploitation better than the Q-learning algorithm. Meanwhile, we research the change of revenue under different parameters, such as the rate of customers arrived, replenishment lead times, shopper waiting times, and the fraction of captives.In a duopoly electronic retail market, we consider two representive cases, that is, no information case and partial information case. In the no information case, because neither of the sellers has any information of the competitor, each seller learns environment knowledge individually. And in the partial information case, where every seller can observe the competitor's state instead of action and reward, in this situation, we model it as a Markov game. According to characteristic of problems, we introduce the WoLF-PHC algorithm under the framework of performance potential. The proposed algorithm uses variable learning rates to adapt to actions that the competitor performed. However, the SA-Q algorithm does not consider the variations of actions that the competitor performed. Therefore, the WoLF-PHC algorithm adapts to environmental changes better than the SA-Q. Several numerical examples are presented to illustrate that both algorithms can solve problems of dynamic pricing effectively in the above cases. And the learning effect of the WoLF-PHC algorithm better than the SA-Q algorithm.
Keywords/Search Tags:Dynamic pricing, Reinforcement learning (RL), Performance potential, Semi-Markov decision process (SMDP), WoLF-PHC algorithm
PDF Full Text Request
Related items