Font Size: a A A

Inferring Associations Between Diseases And Long Non-coding RNAs Based On Projecting And Clustering

Posted on:2019-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2404330572452118Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNA is a non-coding RNA molecule over 200 nt.Many studies show that it can participate in the regulation of various biological processes in cells,and play a crucial role in the occurrence and development of multiple diseases,therefore,the study of the relationship between the long non-coding RNA and the disease has the profound significance for human health.However,due to the variety and complex structure of long non-coding RNA,the mechanism of biological processes is more complicated.At present,the studies of long non-coding RNA still stay at the primary stage.Most of the studies of the relationship between the long non-coding RNA and the disease are to predict the relationship between single long non-coding RNA and diseases.Different from most of study,we aim to find similar long non-coding RNA classes and similar disease classes.These long non-coding RNA classes are closely related to the corresponding disease classes.By functional analyzing to the long non-coding RNA classes,we can obtain the generality of each classes,and this commonness is likely to be the important factors of the occurrence and development of the diseases.We use the Lnc RNADisease database and downloaded the data set that was verified by biological experiments.By simple data processing,we present all sample data in a 0-1 matrix.In order to obtain meaningful classes,we put forward one options that is to use the matrix factorization algorithm to projection the sample data into the high dimensional space,and use the hierarchical clustering according to the hierarchical structure of the disease.Fisher discriminant function is to determine the number of clustering classes.Finally,we get eight kinds of disease classes and the related long non-coding RNAs classes.By comparing several matrix factorization algorithm and analysing the DBI index of final clustering results,although the matrix decomposition of recommender systems is first applied to the question of the relationship of long non-coding RNA and diseases,the algorithm gains best performance.In the end,we use the Gene Ontology annotation tool to analyze the functions of long non-coding RNAs in some classes,find the enriched functions of these long non-coding RNAs.Here,we describe the results of this article.1.The eight clusters,including five classes with clear grouping information,remaining slightly confusing classes which have at least two types of disease.2.In the analysis process of long non-coding RNAs in the class of tumor diseases,through looking up a large number of literatures,we found a long non-coding RNA called MINA,without any logging on.We speculate that MINA is also very likely to be associated with tumor diseases.3.By targeting the relationship between long non-coding RNAs and genes,we use the GO annotation tool which called DAVID to annotation functions of target DNA.According to the hypothesis that the gene function is similar if there have similar sequences,we obtain the enriched functions of four classes of long non-coding RNAs.In summary,different from the analysis of the relationship between of single long non-coding RNA and diseases,we make global analysis about the relationship of long non-coding RNAs and diseases,and describe long non-coding RNAs from the point of view of common features.It provides certain references for the mechanism of the disease and treatment.
Keywords/Search Tags:mapping, matrix factorization, clustering, Gene Ontology annotation
PDF Full Text Request
Related items