Font Size: a A A

Research On Metric Learning-based Biological Data Mining

Posted on:2018-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H HongFull Text:PDF
GTID:2370330569998724Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The calculation of similarity measure is significant in biological data analysis studies.And samples of specific backgrounds have their particular similarity computational model.Accurate expression of similarity measure between samples can greatly improve the effectiveness of follow-up analysis,but in order to accurately build the similarity computational model needs a comprehensive understanding to biological law and samples.Thus how to design a suitable similarity measure for a particular problem has always been a very encouraging but challenging problem for researchers.In most of the early studies,the researchers always empirically design a similarity computation model for specific problems,which may have drawbacks.With the development of large data era,Infinity quantity of data make the empirical design of similarity computation model be unable to utilizing all data's information.Fortunately,metric learning offers a better solution.Metric learning is a data-driven machine learning algorithm,the goal is to learn the appreciate similarity measure between samples.There are many successful applications in face recognition task.In recent years,there have been several researches began to apply metric learning to biological data analysis,which related to proteomics and genomics problem.this paper discussed similarity-related issues in biological data mining research.By applying metric learning to gene expression profiles data and drug-target interaction data,we get better experimental results comparing to origin method.The major contribution of this paper can be divided into two parts:1)gene expression profile data analysis: gene expression profile is a cell's gene transcription level spectrum in specific state,which quantitatively describes the cell's state.At present,the widely used gene expression profiles' similarity measure is a kind of empirically designed model named ScoreGSEA algorithm.Based on the large scale dataset LINCS,we carried out two tasks.Firstly,we analyzed the content of LINCS dataset and mined available information which can be further analyzed.At the same time,by comparing several metric learning methods with the ScoreGSEA algorithm.We found that the ScoreGSEA algorithm does not always fit every session.It is also found that the most suitable method measuring gene expression profiles varies according to different circumstances.Secondly,regarding to the problem of lack of gene ontology terms annotation of many genes,we constructed the gene knockout similarity network by means of metric learning.Finally,we predicted the GO BP terms for unknown Gene by clustering analysis.2)drug target interaction analysis: drug-target interaction prediction is of great significance to drug discovery,drug reposition researches.However,it is common to calculate the similarity between the drug and the target as input.but recent paper only used the classical similarity computation model to calculate similarities.Therefore,this paper constructs a weak-supervised metric learning model named CSML which based on the cosine similarity by using the information of ATC classification system.We utilized the bipartite graph inference model as the basic predictor,and designs the iterative conjugate gradient method to optimize the model.The correctness of the algorithm is proved.The application of CSML significantly improves the performance of the basic predictor and further demonstrates the importance of the metric learning method.
Keywords/Search Tags:Similarity Measure, Metric Learning, Gene Expression Profile, Gene Ontology, Drug-Target Interaction, ATC Classification System
PDF Full Text Request
Related items