| With the continuous improvement of human living standards and the promotion of repeated epidemics,the sales model of fresh food has been continuously updated,and community group purchases have gradually developed.The problem of inventory strategy optimization has become an urgent problem to be solved in community group purchases.The complex constraints and huge solution space in the inventory strategy optimization problem increase the difficulty of its solution.At present,traditional methods have certain limitations in solving inventory policy optimization,and reinforcement learning,as the field of machine learning that has received the most attention recently,has good applications in many problems.Therefore,this paper aims to explore the application of reinforcement learning algorithms in actual logistics.It focuses on the application of reinforcement learning method to solve the optimization problem of fresh inventory strategy under different demand conditions.The main work of the paper is as follows:First,establish a Q-learning model to solve the optimization problem of fresh inventory strategy for periodic demand.First,analyze the existing problems in the management of fresh food inventory in community group purchases,and start from the main factors affecting inventory,elaborate the definition of demand,deterioration rate and inventory cost and their impact on inventory management;secondly,through the design of reinforcement learning quadruple(environmental State observation,agent action,state transition,reward),built a fresh product inventory strategy optimization model based on Q-learning algorithm,and selected user demand data with periodic and random characteristics to train and test the model.It can be seen from the experimental results that when the demand distribution is known,the product deterioration rate is a function of time and preservation cost input,and the preservation cost input is a fixed constant,the ordering strategy based on this model can effectively reduce the expiration cost of the research object,Reduce total inventory cost.Controlling other factors to remain unchanged,changing the cost of fresh-keeping to change the deterioration rate,and exploring the influence of the rate of deterioration on the inventory strategy,the optimal fresh-keeping cost input for the research object is obtained.To sum up,in practical applications,compared with the traditional inventory cost control strategy,the inventory cost control strategy based on Q-learning can control the inventory cost more effectively.Second,in the case of periodic demand,the fresh product inventory strategy optimization model based on the Q-learning algorithm can reduce the expiration cost and inventory cost of fresh products.However,the Q-learning algorithm itself has limited state space and action space,and actions Problems such as step size make it difficult for the model to process high-dimensional big data,and it is difficult to achieve optimality.In order for the model to meet the application requirements of the current big data era,this paper introduces a method combining neural network demand forecasting and Actor-Critic algorithm inventory strategy optimization to solve this problem.Firstly,the demand forecast based on LSTM is carried out,and the result of the demand forecast is used as the input of the Actor-Critic algorithm into the inventory control optimization.The demand forecasting based on LSTM can effectively solve the problem that the Actor-Critic algorithm is difficult to converge.Secondly,Q-learning is replaced by the Actor-Critic algorithm.The Actor-Critic algorithm can just solve the problem of dimension disaster that the Q-learning algorithm cannot handle and solve the problem of high variance.and high bias issues.Compared with the inventory strategy optimization model based on the Q-learning algorithm,the inventory strategy optimization model based on the Actor-Critic algorithm can not only further reduce the freshness Product damage rate and total inventory cost,and also solve the problem of dimensional disaster.Combining LSTM-based demand forecasting and Actor-Critic algorithm-based inventory control model can alleviate the difficult convergence problem of Actor-Critic algorithm.It can be seen that,compared with the inventory optimization strategy based on the Q-learning algorithm,the fresh product retailer inventory strategy optimization model based on the LSTM demand forecast and the Actor-Critic algorithm has a better optimization effect for the random demand with a large amount of data,and more practical application value. |