Font Size: a A A

Using Bayesian case reconstruction in experimental science: The crystallographer's assistant

Posted on:2004-01-28Degree:Ph.DType:Thesis
University:University of PittsburghCandidate:Hennessy, Daniel NicholasFull Text:PDF
GTID:2468390011462662Subject:Computer Science
Abstract/Summary:
Macromolecule crystallization, like many other experimental scientific and medical domains can be characterized by: (1) a large number of variables, including many with a large number of possible values, and including variables that can be manipulated, (2) “given” variables that have limited predictive value, (3) strong interdependence among the variables, (4) a complex theory that is not applicable in the laboratory, including an incomplete domain theory and an incomplete record of past experiments (the “case library”), and (5) time, material and human resource limitations.; Such domains create a series of competing challenges. The incompleteness of the domain theory implies we do not have the understanding to compute an answer to the problem directly. Machine Learning techniques provide mechanisms for augmenting domain models with knowledge extracted from the case library. However, the incompleteness of the case library is such that the additional knowledge would still be insufficient to compute a solution. Furthermore, the complexity of the domains suggests it would be prohibitively expensive to compute such a solution even if we had the understanding. Case-Based Reasoning (CBR) is an alternative approach that is often applicable to problems with incomplete domain theories. However, the combination of the large search space and limited case library makes it necessary to expand the coverage of the cases if CBR is going to be a viable solution.; Bayesian Case Reconstruction (BCR) is a new case-based technique that attempts to address these issues. BCR broadens the coverage of a case library by sampling and recombining pieces of cases to construct a large set of “plausible” cases. It employs a Bayesian network to evaluate whether implicit dependencies within the cases have been maintained. The Bayesian network is constructed from (1) the limited structure available in the incomplete domain model, (2) assumptions about the probability distributions (derived from the limited understanding of the domain), and (3) the data available in the case library. In BCR, the cases are the primary reasoning vehicle. The Bayesian network leverages the available domain model to evaluate whether the “plausible” cases have maintained the necessary internal context. Limitations of the case library are mitigated with explicit knowledge of the available domain model incorporated in the Bayesian network; limitations of the domain model are mitigated with the implicit knowledge available in the case library. Bayesian Case Reconstruction is used in the Probabilistic Screen Design program that is the core of the XtalGrow suite of software.; This thesis addresses the following question: can Bayesian Case Reconstruction be an effective technique for reasoning in experimental science domains. In particular, can it be used in macromolecular crystallization to produce crystallization screens that outperform (i.e., produce more crystals than) the predominant screens currently used in macromolecular crystallization?; The Probabilistic Screen Design (PSD) is empirically evaluated by applying the screens generated by the PSD to a set of 10 test proteins. The PSD screen was applied under the same conditions as two standard screens. An analysis of the results shows that PSD significantly outperformed the other screening methods.
Keywords/Search Tags:Bayesian case reconstruction, Domain, Experimental, PSD, Variables, Screens, Large, Crystallization
Related items