Dynamic Pricing In Electronic Retail Markets By Reinforcement Learning

Posted on:2010-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:J T Wang

Full Text:PDF

GTID:2189360275477552

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology, Electronic Commerce (EC) has applied widely. Therefore, dynamic pricing problems in the electronic retail markets are deserved to research. In this paper, we use reinforcement learning (RL) as a tool to study the problems of dynamic pricing in electronic retail markets consisting of a seller and two competing sellers.At first, we model the monopoly electronic retail market model as a Semi-Markov decision process (SMDP). Combined with the concept of performance potential, we use the Q-learning algorithm and the simulated annealing Q-learning algorithm (SA-Q), which are applicable to both average- and discounted-reward criteria, to solve dynamic pricing problem under the monopoly market. The simulation results of a numerical example show that both proposed algorithms are effective compared with one of simple dynamic pricing algorithms, i.e., DF. And the SA-Q balances the relationship of exploration and exploitation better than the Q-learning algorithm. Meanwhile, we research the change of revenue under different parameters, such as the rate of customers arrived, replenishment lead times, shopper waiting times, and the fraction of captives.In a duopoly electronic retail market, we consider two representive cases, that is, no information case and partial information case. In the no information case, because neither of the sellers has any information of the competitor, each seller learns environment knowledge individually. And in the partial information case, where every seller can observe the competitor's state instead of action and reward, in this situation, we model it as a Markov game. According to characteristic of problems, we introduce the WoLF-PHC algorithm under the framework of performance potential. The proposed algorithm uses variable learning rates to adapt to actions that the competitor performed. However, the SA-Q algorithm does not consider the variations of actions that the competitor performed. Therefore, the WoLF-PHC algorithm adapts to environmental changes better than the SA-Q. Several numerical examples are presented to illustrate that both algorithms can solve problems of dynamic pricing effectively in the above cases. And the learning effect of the WoLF-PHC algorithm better than the SA-Q algorithm.

Keywords/Search Tags:

Dynamic pricing, Reinforcement learning (RL), Performance potential, Semi-Markov decision process (SMDP), WoLF-PHC algorithm

PDF Full Text Request

Related items

1	Research On Reinforcement Learning Based Order Acceptance Model In Make-to-Order Enterprises
2	Using Reinforcement Learning To Study The Features Of The Participants’ Behavior In Wholesale Power Market
3	Deep Reinforcement Learning For Dynamic Pricing
4	Research On Paired Transaction Based On Reinforcement Learning Algorithm
5	Reinforcement Learning Based Dynamic Single Machine Scheduling
6	Research On The Model And Algorithm For The Dynamic Shared Taxi Scheduling Problem
7	Research And Applications Of Action Strategy In Deterministic Strategy Reinforcement Learning Algorithm
8	Research On Knowledge Base Network Updating Mechanism Based On Markov Decision Process
9	Modeling Team-Compatibility Factors Using a Semi-Markov Decision Process: A Framework for Performance Analysis in Soccer
10	Impact Of Consumer Learning On Purchasing Decision And Firm Marketing Strategy