Font Size: a A A

Machine Learning Applications in Genomics, Protein Folding and Protein-Protein Interaction

Posted on:2018-03-12Degree:Ph.DType:Thesis
University:University of Massachusetts BostonCandidate:Farhoodi, RoshanakFull Text:PDF
GTID:2440390002999399Subject:Computer Science
Abstract/Summary:
The field of machine learning, which aims to develop computer algorithms that improve with experience, has widely assisted scientists in understanding of a vast and diverse array of biological phenomena in recent years. Through the analysis of large and complex datasets by efficient and intelligent algorithms, huge advancements have been made in understanding the biological processes taking place in the cell and the underlying causes of many diseases and abnormalities. Consequently the development of new drugs and treatments have become possible.;This thesis presents machine learning solutions for three biological problems. The first problem is focused on building models to predict the structural similarity of a docked protein complex to its native form. Using a set of physico-chemical features and evolutionary conservation, these models not only rank candidate complexes relative to each other, but also outperform the built-in scoring functions of the docking programs used to generate the complexes. The second problem studies how point mutation can impact the structure and consequently the stability of a protein by employing machine learning methods to predict the change in the free energy of the protein. This approach, which has the potential of providing insight on the effects of multiple mutations of amino acids besides single mutations, does not require costly calculations of energy functions that rely on atomic-level statistical mechanics and molecular energetics. In the third part of this work, a method to identify reads from paired-end sequencing data containing inter-chromosomal translocation or insertion breakpoints is proposed. The huge search space in this problem is examined by applying a distance-preserving embedding algorithm to solve the approximate nearest neighbor problem. Experimental validation and comparison with similar existing methods shows the advantages of this approach in detecting breakpoints efficiently and accurately.
Keywords/Search Tags:Machine learning, Protein
Related items