Font Size: a A A

The Prediction Of Interactions Between Protein And Other Kinds Of Molecules Based On Sequences

Posted on:2020-03-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ShenFull Text:PDF
GTID:1480306131967639Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the techniques such as high throughput screening,huge amount of bioinformatical data are demanding efficacious treatment.Information and computing science as well as the correlation techniques have been widely used in a variety of interdisciplinary research programmes,and become popular in the application of bioinformatics.It is convenient for us that using machine learning and data mining algorithms to predict and analyze the metabolic regulation of biomolecules in living organisms.The reason is that the computational scenario can both dwindle the cost of experiments and ameliorate the efficiency of analysis,provide frontier ideas to solve the problems.As a kind of activated biological molecule,protein has important functions.And biochemical molecules interactions between proteins and other kinds of biochemical molecules including DNA and RNA,appear as a pivotal role in cell metabolism.The bases constituting the nucleic acid,and the amino acid groups constituting the polypeptide in the protein,form sequences in a sequential arrangement,which becomes the foundation for the researcher to solve the problem by calculation.This paper is devoted to the study of using multi-scale time-frequency analysis,sparse representation,multi-core learning based multi-information fusion methods to achieve the prediction of interaction between protein and other molecules.1.In this thesis,we focus on using two kinds of scenarios.(a)This thesis has focused on adopting the associated mode of compressed observation and multi-scale method to realize the extraction of sequential features about protein and other molecules.The biological information obtained through experiments is often noisy and redundant,so that it needs to be effectively observed in compressed and multi-scale mode.(b)This thesis also has highlighted on describing the fusion about multiple biological information which withdrawn from different types of spaces.Learning from a single kernel can extract information from a special aspect to process classification and prediction.While under the occasion of limited amount of data,using multi-modal,multi-metric or multi-kernel methods to model the learning machine can extract the valuable information with the greatest extent.And it has achieved wonderful results in many fields.2.This thesis has completed the study from three aspects as following.(a)Prediction of drug-target protein interaction is the first theme of the focus in this thesis.In this section,we propose a method called DAWN.It not only can encode drugs through drug substructure fingerprint dictionary,but also can extract features from target sequences in multi-scale discrete wavelet transform,and combine these content with network information to perform the prediction.The prediction results are gained by Support Vector Machine(SVM).The advantage of the DAWN method is that it can not only obtain high prediction level under the condition with the network information,but also can achieve the goal under the condition without the network.(b)Prediction of protein-ligand binding sites is the second topic in this thesis.Inspired by Average Block(AB)algorithm,we propose an algorithm called Multi-scale Local Average Block(MLAB).MLAB is different from the idea which based on 3D structure.The method adopted is to extract both global and local evolution information from the original sequence at multiple scales,so that can adequately demonstrate multiple overlapping continuous or discontinuous interaction patterns.Simultaneously,combining with Predicted Solvent Accessibility(PSA),this thesis uses Weighted Sparse Representation based Classification(WSRC)to predict proteinligand binding sites.(c)Prediction of interaction between long noncoding RNA(lnc RNA)and protein is also a high light in this thesis.There are two algorithms proposed in this part.The first is called LPI-KTASLP.This algorithm uses multiple information to generate multiple types of kernels,performs kernel fusion by means of Kernel Target Alignment(KTA),and uses low-rank approximation to reduce the computational strength.Finally,the results is obtained from link prediction.Another one is called LPI-FKLKRR,which respectively uses four different similarity matrices in nucleotide and protein space,and uses Fast Kernel Learning(Fast KL)to weight them.The final prediction is achieved by Kernel Ridge Regression(KRR).Recapitulating,the compressed and multi-scale observation and multi-information fusion based on multiple kernel learning methods that proposed in this thesis have not only comprehensively considered the molecular correlation properties and network topology information,but also adopted the information compression method to effectively remove the noise and redundancy,achieved satisfactory prediction performances.
Keywords/Search Tags:Bioinformatics, Feature Extraction, Multiple Kernel Learning, Drug-target Interaction, Protein-DNA Binding Site, LncRNA-protein Interaction
PDF Full Text Request
Related items