Font Size: a A A

Research On Approximate Algorithms Of POMDP And Application To TCM Therapy Planning

Posted on:2012-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q FengFull Text:PDF
GTID:1118330335951303Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequential decision-making is a problem that could be encountered frequently and becomes an interesting research field of artificial intelligence and control during production and our life. Partially Observable Markov Decision Process (POMDP) is a powerful probabilistic model for planning under uncertain environment. However, any exact algorithm could not be able to solve large-scale POMDP problems by dynamic programming over the whole belief space. Therefore, research on approximate POMDP algorithms is of great value both theoretically and practically, in which value iteration algorithms using point-based methods become main solutions. When computing value function, point-based algorithms use backup operations over a finite set of reachable belief states. The choice of the set of belief points and the order of backups are two key issues in point-based methods. Because existing algorithms have some drawbacks in these two aspects, more efficient method of choosing belief points is an important task for accelerating the convergence and is one of the main research contents of this dissertation.Dynamic treatment regime planning is a multi-step sequential decision-making problem under uncertainty in domain of medicine. Dynamic sequential intervene is the essential therapy method of treating chronic disease in Traditional Chinese Medicine (TCM). The characters of individualized treatment principle and individuation of TCM physicians make the clinical data contain diversified treatment plans in TCM sequential therapeutic procedure. TCM physicians usually summarize optimal therapy plan and effective TCM experiential knowledge from large-scale clinical data without the help of randomized controlled trials. However, it is a long process to summarize effective sequential therapy regime by traditional methods of summing up experience from clinical materials. So discovering and identifying optimal dynamic treatment regime from large-scale multidimensional clinical data is a key research topic in TCM. Aiming at this problem, we propose a POMDP solution for modeling TCM clinical observable data and exploring optimal dynamic treatments. This POMDP model could be served as a powerful tool for discovering dynamic treatment regimes and evaluating clinical treatments in TCM.The main contents and contributions of this dissertation are as follows:1. We discuss the systematic conclusions of theories and methods of point-based POMDP approximate solvers proposed recently and give a deep analysis of the key issues in point-based algorithms. And these backgrounds serve as the basis of the research of this dissertation.2. We propose a belief selection method based on the uncertainty of belief point (UBBS for short). When expanding the belief set, this algorithm first computes the uncertainties of the belief points that could be reached, and then selects the belief points that have lower uncertainties and whose 1-norm distances to the current belief set are larger than a threshold. We use two different methods to represent uncertainty of a belief state:one method uses entropy to describe the uncertainty of a belief point, and the other is based on the gap between the maximal and minimal elements of a belief state to compute the uncertainty. Experimental results indicate that this method is effective to gain an approximate long-term discounted reward using fewer belief states than other point-based algorithms.3. We propose a new value iteration algorithm based on shortest Hamiltonian path (Shortest Hamiltonian Path-based Value Iteration, SHP-VI). SHP-VI is also a trial-based algorithm. This method computes an optimal sequence of actions using an approximate algorithm to compute the shortest Hamiltonian path, explores an optimal belief trajectory by simulating interaction between the agent and environment with the resulting actions, and updates the value function over the encountered belief states in a reversed order. Experimental results show that SHP-VI accelerates the computation of belief trajectory greatly compared with other trial-based methods and reduces the iterations for solving POMDP optimal value function.4. According to the key issue of TCM clinical evaluation that how to identify optimal dynamic treatment regime from large-scale multidimensional clinical data, we propose a method using POMDP model to solve this problem. It is the first time that POMDP has been applied to optimizing dynamic therapy planning in TCM, and all parameters of this model are calculated from real-world clinical data. In this model, we treat the symptoms that could be observed directly by TCM physicians as observation variables, and use.K-means cluster algorithm to model health states by setting the number of clusters in a reasonable range. The transition probabilities and observation functions are calculated from large amount of clinical data. The immediate evaluation of a treatment is measured by weighted sum of improvements of all symptoms. We model the clinical data on type 2 diabetes mellitus with POMDP and identify the optimal treatment plans by PBVI and UBBS algorithms. Experimental results demonstrate that POMDP could help to abstract effective sequential treatment regime from data and provide useful reference for clinical evaluation of TCM. At the same time, this experiment also shows the validity of our UBBS algorithm in real-world problems.
Keywords/Search Tags:Machine Learning, Sequential Decision-making, POMDP, Point-based Value Iteration, Optimal Dynamic Treatment Regime, Traditional Chinese Medicine(TCM)
PDF Full Text Request
Related items