Font Size: a A A

Research Of Policy Algorithms Applied To Perceptual Decision-Making Tasks

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:J Y GaoFull Text:PDF
GTID:2428330599476464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is a type of learning algorithm that uses the interaction with the environment to adjust its policy for getting the most rewards.Reinforcement learning algorithm has been widely used not only in application problems such as robots and chess,but also recently used to model human behavior and decision making.In particular,the release of dopamine neurons in the brain has the same varied pattern as the prediction error in reinforcement learning algorithm.Many scholars even speculate that the brain may adopt an algorithm similar to reinforcement learning when solving decision-making problems.However,decisions based on classical reinforcement learning tend to yield the best revenue,while many behavioral experiments show that animal or human's decision-makings are not entirely rational Although it is possible to fit irrational behavior data by adjusting parameters such as learning rate in reinforcement learning algorithm,when decisions lead to both of rational and irrational results,it is extremely difficult for the method of simply adjusting parameters to fit the behavior data.Therefore,it is necessary to explain why there are two patterns of rational(optimal)and irrational(sub-optimal)when humans perform perceptual decision-making tasks by designing new reinforcement learning algorithms.Extending new reinforcement learning algorithms through fitting behavior data not only provides new modeling tools for the brain's decision-making computation,but also provides an important reference for designing new reinforcement learning algorithmsBased on policy gradient algorithm of reinforcement learning,this paper designed two kinds of policy algorithms that could satisfy both of rational and irrational decision-making results in perceptual decision-making tasks,and verified the algorithms' accuracy through comparing and analyzing the similarities and differences between policy algorithms and humans when performing the same perceptual decision-making tasks.The main work and results of this paper were as follows:1.In order to obtain behavior data,designed and implemented two perceptual decision-making tasks.These decision-making tasks included three main functional modules:information perception,action and feedback.In addition,designed two reward schemes for allowing the decision-making tasks to have both of rational(optimal)and irrational(sub-optimal)outcomes.These two decision-making tasks were implemented by using the PyQt and PsychoPy libraries of Python language.2.Aiming at the problem that classical reinforcement learning algorithm cannot converge to both of rational and irrational behaviors when modeling the decision-making tasks,proposed two improved policy algorithms.The first algorithm was obtained from the objective function derived from policy gradient,and the policy parameters were comprised of rewards and experiences.The second algorithm decomposed the rewards into an internal reward and an external reward,where the internal reward was related to the prediction of current state,while the external reward was the feedback of the results' correctness after performing an action.3.In order to verify whether the behavioral patterns converged to the expected outcomes,and whether the policy algorithms could fit the behavioral data,statistically analyzed the fitting results.Based on reinforcement learning theory,this paper designed two kinds of improved policy algorithms,which could converge to optimal and sub-optimal behaviors that classical reinforcement learning cannot lead to.And it has a certain reference value for forming new reinforcement learning algorithms.The future research direction is to apply the algorithm to engineering fields such as robot control and chess games.
Keywords/Search Tags:reinforcement learning, policy algorithm, decision-making, sub-optimal behavior, optimal behavior
PDF Full Text Request
Related items