Font Size: a A A

Computational approach to identify deletions or duplications within a gene

Posted on:2007-05-18Degree:Ph.DType:Dissertation
University:The University of IowaCandidate:Kalari, Krishna RaniFull Text:PDF
GTID:1448390005468259Subject:Biology
Abstract/Summary:
Although high-throughput methods exist to identify most small disease causing mutations (e.g. substitutions that alter an amino acid), assays to identify larger classes of mutations such as deletions/duplications are time consuming, laborious and expensive. No in-silico system exists to identify intragene deletion or duplication candidates. We hypothesize that a computational system, SPeeDD (System to Prioritize Deletion or Duplication candidates), utilizing machine learning techniques can be employed to identify the most likely disease causing deletion or duplication candidates within a gene.; Informative sequence based features were obtained from a set of genes with known intragene deletions or duplications for data mining. Machine learning techniques were applied to this data. The logic model tree (LMT) method, which is a combination of decision tree and logistic regression model, yielded the best results. Sensitivity varied depending on the type of machine learning model used, but specificity exceeded 90% for all methods evaluated. Sensitivity of the system ranged from 20% to 71.6% depending on the type of machine learning method. We were also able to find the new BRCA1 case using our system.; These results suggest that the SPeeDD system provides good sensitivity and specificity and can be used to prioritize candidate genes and gene regions for screening. Focused screening for copy number variations in prioritized regions will reduce the labor and associated costs of the biological assays, and should accelerate the process of mutation discovery.
Keywords/Search Tags:Identify, Machine learning, Deletion, Duplication
Related items