Font Size: a A A

Statistical design and analysis of high throughput screening data using pooling experiments and data mining techniques

Posted on:2005-03-07Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Remlinger, Katja SFull Text:PDF
GTID:1458390008981002Subject:Statistics
Abstract/Summary:
Discovery of a new drug involves screening large chemical libraries to identify new and diverse active compounds. Only a very small percentage of the compounds in the library are active. Naive screening approaches of testing all compounds in the library are not desirable since in addition to being expensive, they provide little information on what aspects of the chemical structure of active compounds are related to activity.; This work investigates pooling experiments as one possible approach of improving screening efficiency and gaining insight into the structure-activity relationships. Four different pooling designs are proposed using two design criteria, optimal coverage of the chemical space and minimal collision between compounds. We evaluate each method by determining how well the design criteria are met and whether the methods are able to find many diverse active compounds. One pooling design emerges as a winner, but all designed pools clearly outperform randomly created pools. Furthermore, different analysis approaches of the pooling designs are investigated. Multiple trees are compared to model-based likelihood approaches with different covariate class definitions. Results show that a model-based likelihood approach with a multiple-trees-lower-bound covariate class definition gives the best performance. Another possible approach of improving screening efficiency and gaining insight into the structure-activity relationships is the use of data mining techniques such as RandomForest and ChemTree. These techniques are applied to individual compounds.
Keywords/Search Tags:Screening, Compounds, Data, Pooling
Related items