Font Size: a A A

Heuristic Learning Model Based On Partially Observable Markov Decision Process

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:J LuoFull Text:PDF
GTID:2518306557470634Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of knowledge and technology,how to master them to fulfill the related tasks as soon as possible has been a significant research topic.Consider that individuals have different capability,and high priority can be provided to some objects.The key challenges are to improve the quality and efficiency of learning via a heuristic way,while reducing the unnecessary time consumption and costs,which have become the issues of common concern.In order to address the limitation of traditional the heuristics learning and optimization of the learning resource allocation,we propose the heuristic learning model based on partially observable Markov decision process(HL-POMDP),as an advanced version of the uniform sampling and greedy strategy learning methods.The propose HL-POMDP method utilizes the exponentially weighted moving average to compare the aggregated learning effects to dynamically allocate the resource of the learning.Furthermore,the resource can be optimized by using the stop condition for learning.Therefore,while guaranteeing the better learning quality of the high-priority users,HL-POMDP can improve the entire learning efficiency of all users including the high-priority users.Finally,the LSTM neural networks for multiple digit decimal adder are used to simulate the users with different capacities.The massive experiments have validated the effectiveness of HL-POMDP,which is superior to the uniform sampling and greedy strategy learning methods in terms of learning quality and accuracy.
Keywords/Search Tags:reinforcement learning, Partially observable Markov decision process, priority, Guided learning
PDF Full Text Request
Related items