Font Size: a A A

RBF-Q Learning Optimization Algorithm Of Conveyor-serviced Production Station With Multi-type Products

Posted on:2019-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330548457510Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In many real-world production lines,there is a production line,which is mainly composed of a production station as processing production,such a system is called conveyor-serviced production station(CSPS).with the diversification of social demand,the mode of multi-type products becomes a trend,and this way can effectively improve production efficiency and meet diversified production demand.In this thesis,the look-ahead optimization control problem for a class of CSPS with multi-type products is studied,and the objective is to obtain the optimal long-run expected cost of the system by choosing a suitable look-ahead control policy.Theoretically,the optimization problem can be treated by the exact solution techniques,but this method relies on accurate model parameters.Q-learning is a kind of model free algorithm which can overcome this difficulty,but this method requires discretization of actions and lack of generalization ability.When the number of varieties of system is increased,the system state scale will show exponential growth,the traditional Q-learning algorithm will face dimension disaster in the face of large-scale discrete state.(1)To solve the above problems,the RBF neural network with excellent information generalization and fast learning capability is used to combined with Q-learning algorithm,and applied to the optimization control in the multi-type products CSPS system.The neural network is used to approximate value function and realize the continuous action of variables,and the RBF-Q learning algorithm is provided,which can overcome the faults of Q-learning algorithm and the dependence on model parameters of the exact solution.The input of the RBF network is the state action pair,and the output is the Q value of the state action pair,and the simulation analysis is carried out for different varieties.The simulation results show that the RBF-Q learning algorithm can effectively optimize the performance cost of multi-type CSPS systems and improve the learning speed.(2)Further,we analyzes the problem in the process of learning and updating of the RBF-Q algorithm,which combines neural network and reinforcement learning to the look-ahead optimization control of multiple CSPS system.we use the experience replay mechanism and target function to the improved RBF-Q learning algorithm.The simulation analysis is carried out for different varieties.The simulation result shows the excellent optimization effect of the proposed algorithm.
Keywords/Search Tags:Multi-Type products, Conveyor-serviced production station (CSPS), RBF network, Q-learning
PDF Full Text Request
Related items