Font Size: a A A

Study On Key Gene Discovery Technology And Analysis Tools By Transcriptome Data

Posted on:2023-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q C KangFull Text:PDF
GTID:2530306791481584Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Transcriptome-based data mining is meaningful to reveal the biological response mechanism of drugs/diseases,discover and interpret key genes in biomedical research.High-throughput sequencing technology brings massive amounts of data,which brings challenges to systematic bioinformatics analysis.Firstly,it is difficult to construct an effective analysis process to extract biological information from large-scale data.Secondly,the analysis methods of bulk RNA sequencing data are relatively mature,but there are still deficiencies in characterizing the biological significance of the data.Thirdly,the development of analysis methods for single-cell transcriptome sequencing data is booming.There are more than 100 automatic annotation tools for cell types alone.Efficient selection of appropriate analysis methods is a constraint in the analysis process.Relying on the in vivo drug and tissue-level biological response data constructed in the early stage,the analysis pipeline of biological response transcriptome data was constructed.Additionally,analytical tools were developed and evaluated.The main research contents were divided into the following three parts.Firstly,the analysis process of transcriptome data was constructed by typical bioinformatics analysis methods such as differential expression analysis,gene network screening,and functional enrichment.Mainly including,comparing the self-built data with public data,it was found that the quality was high and the biological significance was richer;construction of gene co-expression network screening found that the transcription factor-related target gene Top2a was at the center of the network under drug perturbation;extensive expression profiling identified andrographolide as a potential inhibitor of the receptor Ace2 gene of SARS-CoV-2;the gene Stat1 was found to be within a paeonol-specifically associated gene module and associated with attenuation of anti-inflammatory compounds by weighted co-expression network analysis;analysis of tissue response data to metformin revealed key genes in different tissues.In the process of data analysis,the current functional enrichment methods were still insufficient in characterizing gene sets,and the cell type detection algorithms for single-cell transcriptome data were complicated.The research content was mainly discussed in the first chapter.Then,the PFP toolkit was developed based on the R studio platform,and the biological pathways in the Kyoto Encyclopedia of Genes and Genomes database(KEGG)were downloaded to construct the base map of the gene network.The PFP toolkit mapped target differential gene sets into biological pathways and calculates scores for genes’critical connections in pathway networks.The PFP tool utilized gene connections in biological pathways to interpret gene sets,providing an open and easy-to-use toolkit in the Bioconductor community.The research content was mainly discussed in the second chapter.Finally,methods for automated annotation of cell types for single-cell sequencing transcriptome data were evaluated.Three automated annotation tools for cell types with two resolutions(cell cluster/single cell)were selected to construct six methods,and gold standard datasets were collected to evaluate indicators such as annotation accuracy,missed detection rate,and running time.Methods based on single-cell annotation resolution were found to perform better in the classification of immune-related cell subtypes.Among them,scmap-cell had the best annotation accuracy,and SingleR-cluster showed the most stable accuracy.SingleR used a small-scale reference set for annotation,which had obvious speed advantages.The performance of the automatic annotation method in different application scenarios was summarized and recommended.The research content was mainly discussed in the third chapter.Based on the analysis and application of transcriptome data,this study constructed the relationship between key genes and drugs found in the analysis process to provide reference for subsequent drug-related gene research.In terms of tool development,the biological pathway information characterized by PFP tools is more comprehensive,and the biological significance of differential gene sets can be more effectively focused and screened,providing a reference for the development of gene set interpretation tools.In terms of the recommended application of the tool,the existing automated annotation methods for cell types were effectively evaluated,and method recommendations for different application scenarios were provided.In general,a relatively complete transcriptome data analysis process was constructed,laying the foundation for subsequent drug target omics and inflammatory network analysis.
Keywords/Search Tags:Transcriptome, Single cell transcriptome, Functional enrichment analysis, Biological response data, Automated annotation for cell type
PDF Full Text Request
Related items