Font Size: a A A

Differential Analysis Methods For Transcriptome Sequencing Data And DNA Sequence Structure

Posted on:2020-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:1360330590973041Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Genes are DNA sequences with genetic information.With the advancement of sequencing technology,gene sequencing has uncovered the profound truth in biology.Gene expression reflects the evolutionary process of cells.At the same time,with the emergence of RNA sequencing technology and single-cell sequencing technology,the diversity and heterogeneity of gene data and structure gradually appear.However,according to the huge amount of samples of gene data and the high complexity of gene structure,how to analyze gene data accurately remains a challenge and how to select the pathogenicity gene poses a significant significance.This thesis mainly focuses on the analysis of RNA sequencing data,the single-cell time-series sequencing data and the structure of DNA sequence.The main research contents of this thesis are as follows:Firstly,based on the RNA sequencing data,there is still no good ways to compare multiple-sample data sets.An entropy-like function is constructed to identify differentially expressed genes by using the information entropy theory,and a method of identifying differentially expressed genes(DEF)based on the entropy-like function is studied.First,compared with the traditional methods such as DESeq2,edgeR,baySeq and limma,DEF method is applicable for multiple group data sets which has a wider range of applications.Second,DEF method has the same function as traditional methods,and can be applied for the analysis of gene differential expression between two groups of data.At the same time,DEF method can be applied for data sets with many zero expression values.Therefore,DEF method can detect differentially expressed genes that have not been detected by traditional methods.Meanwhile,the experimental results show the effectiveness of DEF method.Finally,according to the microRNA data of Huntington's Disease,the differential expression of microRNAs between the disease control and cases are analyzed by using DEF method.Possible biomarkers of Huntington's Disease are predicted.Moreover,this thesis further identifies microRNAs that are transcriptionally associated with disease burden.It lays a foundation for the diagnosis and treatment of Huntington's Diseases.Secondly,considering the uneven characteristics of single-cell time-series data,the thesis studies dynamic time warping score(DTWscore)which is applicable for the gene differential expression analysis based on the single-cell time-series sequencing data.The DTWscore method is suitable for non-uniform single-cell time-series data,which solves the problem of difference analysis between two time-series data with non-uniform intervals.Meanwhile,DTWscore method is applied to simulate data sets and real data sets to verify the effectiveness of the method.Furthermore,based on differentially expressed genes identified by DTWscore method,these differentially expressed genes are used to identify potential cell types,and the cells are clustered.DTWscore,as a useful tool for identifying differentially expressed genes and inferring potential cell types,plays an important role in the study of biological process.Thirdly,due to the structural complexity of DNA sequences and the mis-judgment phenomenon of the topological entropy method in the differential analysis of finite length DNA sequence,the vector topological entropy of DNA sequence is proposed which is based on the topological entropy theory of a sequence.The thesis reduces the misjudgment phenomenon of the topological entropy method in the differential analysis of finite length DNA sequence.At the same time,it is proposed that the vector topological entropy is not suitable for the comparative analysis of DNA sequence with different lengths.Therefore,the thesis identifies the K-quantity topological entropy of DNA sequence that can be used to analyze the differences of DNA sequences of different lengths.The numerical experiments verify the validity of the method.Furthermore,for the DNA sequence with infinite lengths,it is proved that the generalized topological entropy of infinite length DNA sequence equals to its topological entropy.At the same time,the finite approximation method of generalized topological entropy is studied,and its properties are analyzed.It is shown that the finite approximation method of generalized topological entropy performs well on the sequence with infinite lengths and the theory of generalized topological entropy provide a new idea for the differential analysis for the structure of DNA sequences.
Keywords/Search Tags:Differential entropy-like function, Differential expressed genes, Vector topological entropy, DNA sequence structure, Dynamic time warping
PDF Full Text Request
Related items