Font Size: a A A

Research Of Cross-kingdom SRNA Data Analysis Method Based On High-throughput Sequencing Data

Posted on:2020-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:M P ZhanFull Text:PDF
GTID:2370330575469939Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Small RNAs(sRNAs)are a kind of non-coding small molecule RNAs with length from tens to hundreds nucleotides,which regulate gene expression by complementary pairing with target RNA and are involved in multiple biological processes of cells.Initially,sRNA was difficult to detect due to its short genes.With the advent of high-throughput sequencing technology,more and more sRNAs have been discovered,and as an important regulator,it has gradually become a hotspot in life science research.Studies have confirmed that sRNA plays an important role in cross-kingdom regulation.Since the sequence and structure of sRNA are closely related to its ability to enter host cells,analysis of sequence and structure of cross-kingdom sRNA to identify similar sRNA,can not only find the relevance of sRNA sequences and structure and function,but also have important significance for effectively identifying unknown cross-kingdom sRNA.Up to now,research in sRNA has focused on sRNA sequence analysis and target gene function recognition,while the study of cross-kingdom sRNA is still in experimental verification,and all of them are aimed at specific species.Therefore,based on traditional RNA data analysis methods,this paper proposes a data analysis method to fungi and plant,plant and human high-throughput sRNAs data.Statistical methods were used to analyze cross-kingdom sRNAs' sequences and structural features.By analyzing cross-kingdom sRNA that differentially expression,molecular features that may influence sRNA entry into host cells to cross-kingdom regulation can be identified.Then,constructing cross-kingdom sRNA identify model based on machine learning to identify exogenous sRNA that might be absorbed by host.This thesis first collects fungi and plant,plant and human sRNA data and performs a series of preprocessing such as quality control?adaptor shearing and gene identification.Then,using machine learning methods to extract sRNAs' sequences and structural features and constructing cross-kingdom sRNA identify model,which can be used to identify sRNAs that can enter host cells.Finally,cross-kingdom sRNAs' biological function can be analyzed by screening target gene,doing functional enrichment analysis and mining gene interaction relationship.In this paper,magnaporthe oryzae and rice,human and plant sRNA data were analyzed by the data analysis method proposed in this article.Among results,the model's correct rate of magnaporthe oryzae and rice is 84.5%,and the correct rate of plant and human body is 78.2%.The data analysis methods to cross-kingdom sRNA high-throughput sequencing data proposed by this paper provides new research idea in research of the relationship between the ability of cross-kingdom sRNA enter in host cells and their sequences and structure features.This paper can provide some new ways for the future study in cross-kingdom sRNA regulation mechanism,and has some guiding significance in crops,drugs and diseases.
Keywords/Search Tags:Small RNA, High-throughput sequencing data, Sequence and structural features, Machine learning
PDF Full Text Request
Related items