Font Size: a A A

PiRNA Identification,Data Simulation And PiRNA Association Study With Disease

Posted on:2018-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J LiuFull Text:PDF
GTID:1364330542973096Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Epigenetics is a hotspot of life sciences research in recent years,of which non-coding RNA research is an important part.Researches on mi RNA and long non-coding RNA(lncRNA)drives to maturity stage,whereas,piRNA research is in its infancy,and has been concerned by the researchers.The piRNA was discovered in 2006 and is a class of small non-coding RNAs(snc RNAs)as an interaction with the PIWI family.More and more literatures show that piRNA plays an important role in biology process,such as silencing transposon,regulation of messenger RNA(mRNA)and lncRNA.Especially in cancer research,piRNA becomes a new molecular marker and therapeutic target.Therefore,the simulation of piRNA expression data,accurate and rapid identification of piRNA,and its disease-related researches will develop a new perspective for understanding,diagnosis and treatment of disease.This thesis focuses on bioinformatics research of piRNA.The main contributions of the thesis are outlined as below.1.A simulation method of piRNA expression data is proposed.Risk models and disease development models were designed to simulate the occurrence and development of diseases.With advances in molecular biology,expression data have become an important resource in the exploration of complex human diseases.Although expression data continues to grow,there are still many problems in expression data,such as high dimensions,small sample sizes,big noise and unbalanced distribution of sample type.These real expression data can't effectively support the piRNA association study with disease,and limit the development of piRNA bioinformatics research.Hence simulation data have a vital role in the piRNA association study with complex diseases.A simulation method of expression data is introduced in this study to model the occurrence and development of diseases.Six risk models are proposed.These models can be introduced into the baseline simulation dataset with a case-control label.In addition,time-series gene expression data can be generated to modeling the dynamic evolutionary process of a disease.Disease-associated piRNAs are tested by significance analysis of microarrays(SAM).The results show that the most of disease-related piRNAs can be successfully detected,and models with low prevalence are difficult to be excavated.These results indicate that our method is viable for piRNA expression data simulation.2.We develop an algorithm which is used for the identification of piRNAs from other nc RNAs,and this algorithm is based on sequence features and a support vector machine(SVM)classifier.It provides a better balance between precision and sensitivity in identifying piRNA.Distinguishing piRNAs from other non-coding RNAs is essential because of because piRNAs lack conservative secondary structure motifs and sequence homology.Few computational studies have addressed piRNAs detection,and both effectiveness and efficiency of piRNA detection tools require improvement.In this study,a piRNA detection method based on sequence features and a SVM classifier was developed.Four features are proposed: weighted k-mer,weighted k-mer with wildcards,position-specific base,and piRNA length.The piRNA sequences from human,mouse,rat,and drosophila were respectively used in this experiment.Compared to existing methods,the proposed method provides a better balance between precision and sensitivity(both are approximately 90%).Although slightly slower than Piano approach,the proposed method was four-fold faster than pi RPred and 229-fold faster than piRNA predictor.3.We predict cancer-related piRNA-mRNA and piRNA-lncRNA interactions.In prediction process,sequence data and expression data of RNAs are considered.We found that functions of piRNAs' potential target are enriched in "activation invasion and metastasis" and "sustained angiogenesis" which are closely related to cancer markers.piRNAs become valuable biomarkers in cancer.Recent research shows that piRNA-mediated cleavage acts on transposable elements(TEs),messenger RNAs(mRNAs),and long non-coding RNAs(lncRNAs).Meanwhile,some studies found that piRNA may participate in tumorigenesis by regulating cancer-related genes.However,the interactions between piRNA and other RNA in cancer are poorly reported.This study aimed to predict cancer-associated piRNA–mRNA and piRNA–lncRNA interactions as well as piRNA regulatory functions.Four cancer types(Breast invasive Carcinoma,Head and Neck Squamous cell Carcinoma,Kidney Renal clear cell Carcinoma and Lung Adenocarcinoma)were investigated.Interactions were identified by integrated analysis of the expression and sequence data.We identified 198 piRNA–mRNA and 10 piRNA–lncRNA pairs.Unlike mRNA and lncRNA expressions,the piRNA expression was relatively consistent across the cancer types.Furthermore,the identified piRNAs were consistent with previously published cancer biomarkers,such as piRNA-36741,pi R-21032,and piRNA-57125.More importantly,predicted piRNA functions were determined by constructing an interaction network,and piRNA targets were placed in gene ontology categories related to the cancer hallmarks “activating invasion and metastasis” and “sustained angiogenesis.”4.Based on the combination of weighted gene co-expression network analysis(WGCNA)and functional enrichment analysis,we identify some cancer-related piRNA modules.The most important biological function of these targets of hub piRNAs in modules is the "cell migration regulation" function,which reveals that piRNA is associated with cancer metastasis.Although several research studies establish that piRNAs are valuable biomarkers in cancer,it is difficult to understand the role of piRNAs.In addition,we need to further investigate the interplay between piRNAs and piRNA groups in tumorigenesis.To identify cancer associated piRNA modules,we performed a system biology method to piRNA expression data in 11 types of cancer.This method combines Weighted Gene Co-Expression Network Analysis(WGCNA)and functional enrichment analysis.The results indicate that these piRNA modules have significant associations with cancer.A module with high correlation coefficient(cor:-0.83,p value: 1.86E-128)is found,especially in HNSC,and piRNAs in this module can cause tumor inhibition.Moreover,genes associated with hub-piRNAs in modules are predicted,and functional analysis of these genes interprets the relationship between piRNA and cancer.We find that hub piRNAs in modules can contribute to the metastasis of cancer.
Keywords/Search Tags:piRNA, bioinformatics, simulation, identification, RNA-RNA interaction, cancer
PDF Full Text Request
Related items