Flotation process is an important method to achieve high-efficiency and high-quality beneficiation.It changes or enlarges the difference of hydrophilicity and hydrophobicity between ore particles and useless gangue particles through the effect of beneficiation reagent,so as to achieve the separation of fine ore particles.The level control of flotation cell has an important influence on flotation process.As a typical complex industrial process,flotation has the following characteristics and difficulties:in the flotation process,the mixing tank is connected with the flotation tank,and the level of the mixing tank and the level of the flotation tank affect each other;the ore feeding flow in the mixing tank is affected by the upstream process(crushing process,grinding process,thickening process),and unpredictable disturbance occurs at the set value;the level of the mixing tank moves in the mixing process,which is difficult to measure,due to the frequent fluctuation in the flotation process;the boundary conditions of the flotation process are affected by the production process requirements,the grade of mineral raw materials and other factors,resulting in frequent changes in the operating point of the flotation process,resulting in the failure of the model.In view of the above problems existing in the level control of flotation process,this paper proposes input-output feedback Q-learning H∞ tracking control algorithm and input-output feedback off-policy H∞tracking control algorithm for the level control of flotation cell considering the disturbance of feed quantity of the agitation cell.The work involved in this paper includes the following aspects:·Research on Q-Learning H∞ tracking control with data-driven input-output feedback.In this paper,firstly,an augmentation system is constructed to represent the dynamic of the system output.The state of the augmentation system is completely composed of historical input and output data.The tracking control problem can be realized indirectly by H∞ stabilization control of the augmented system,which solves the problem that the output feedback data can not fully reflect the state information of the system.The H∞control is considered as a zero sum game process,in which the control input is used as the player to minimize the H∞ tracking control index;the disturbance input is used as the opponent to maximize the index.Then,the output feedback H∞ set point tracking control problem is transformed into the solution of game Riccati equation.Q-learning algorithm is used to solve the game Riccati equation.·Based on the off-policy algorithm,the H∞ level control problem with data-driven inputoutput feedback is realized.There are two different strategies in the learning process of off-policy algorithm:performance policy to generate data and target policy is learned from off-policy algorithm,while in Q-learning algorithm,the strategy to generate data is the same as the target policy.The off-policy algorithm assumes that the input is not generated by the target policy,and compensates the Q-learning Bellman equation based on this assumption.The data-driven off-policy control algorithm is implemented by adding two auxiliary items.Off-policy algorithm makes up for some shortcomings of Q-learning algorithm.In the learning process of off-policy algorithm,the change of the target strategy does not affect the performance strategy,and the control strategy does not need to be changed frequently.To ensure the stability of the performance strategy can ensure the stability of the control system in the learning process.Q-learning requires that the target strategy be brought into the system,but the input of the system is not completely determined by the target strategy,so it is necessary to add probing noise to ensure that the system is fully explored.The existence of probing noise does not take into account the learning process,which may have a certain impact on the results of Q-learning.The performance strategy different from the target strategy in off-policy algorithm can eliminate the influence of probing noise on learning results and make off-policy an unbiased algorithm.·MATLAB simulation is carried out for the flotation process level control model by Q-learning control algorithm and off-policy control algorithm respectively,and the effectiveness of the two algorithms is verified,and the simulation results are analyzed.The performance difference between off-policy control algorithm and Q-learning control algorithm is compared.It is verified that the optimal strategy obtained by off-policy control algorithm is better than Q-learning optimal strategy in tracking speed and disturbance suppression.The experimental results show that off-policy algorithm is an unbiased control algorithm,which can eliminate the influence of exploration noise. |