Font Size: a A A

A software tool architecture to assist disease gene identification

Posted on:2002-07-05Degree:Ph.DType:Thesis
University:The University of IowaCandidate:Braun, Terry AFull Text:PDF
GTID:2464390011492960Subject:Biology
Abstract/Summary:
The publicly-funded effort to read the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary “draft” format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (Escherichia coli, Saccharomyces cerevisiae, the fruit fly Drosophila melanogaster, the worm Caenorhabditis elegans, and the laboratory mouse), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data discovered approximately 30,000–40,000 human “genes.” A consequence of the HGP is the existence of hundreds of databases containing biological information and relevant data pertaining to the identification of disease-causing genes. The several distinct steps involved in applying high performance computational methods to, and extracting information from, existing biological databases to assist disease gene identification include: (1) acquiring data, (2) finding interrelated and gene-related information, (3) filtering data, (4) integrating information, (5) nominating candidate disease genes, and (6) prioritizing candidate disease genes. This thesis proposes a software tool architecture to facilitate the utilization of genomic and biological data resources to nominate and prioritize candidate disease genes. Various components of the system have been designed, implemented, and applied to mine biological databases for specific genomic intervals to nominate potential candidate disease genes, and identify gene-related information and novel sequence characteristics associated with Bardet-Biedl Syndrome (BBS) and autism.
Keywords/Search Tags:Disease, Human genome, HGP, Information
Related items