Study On Key Gene Discovery Technology And Analysis Tools By Transcriptome Data

Posted on:2023-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:Q C Kang

Full Text:PDF

GTID:2530306791481584

Subject:Biomedical engineering

Abstract/Summary:

Transcriptome-based data mining is meaningful to reveal the biological response mechanism of drugs/diseases,discover and interpret key genes in biomedical research.High-throughput sequencing technology brings massive amounts of data,which brings challenges to systematic bioinformatics analysis.Firstly,it is difficult to construct an effective analysis process to extract biological information from large-scale data.Secondly,the analysis methods of bulk RNA sequencing data are relatively mature,but there are still deficiencies in characterizing the biological significance of the data.Thirdly,the development of analysis methods for single-cell transcriptome sequencing data is booming.There are more than 100 automatic annotation tools for cell types alone.Efficient selection of appropriate analysis methods is a constraint in the analysis process.Relying on the in vivo drug and tissue-level biological response data constructed in the early stage,the analysis pipeline of biological response transcriptome data was constructed.Additionally,analytical tools were developed and evaluated.The main research contents were divided into the following three parts.Firstly,the analysis process of transcriptome data was constructed by typical bioinformatics analysis methods such as differential expression analysis,gene network screening,and functional enrichment.Mainly including,comparing the self-built data with public data,it was found that the quality was high and the biological significance was richer;construction of gene co-expression network screening found that the transcription factor-related target gene Top2a was at the center of the network under drug perturbation;extensive expression profiling identified andrographolide as a potential inhibitor of the receptor Ace2 gene of SARS-CoV-2;the gene Stat1 was found to be within a paeonol-specifically associated gene module and associated with attenuation of anti-inflammatory compounds by weighted co-expression network analysis;analysis of tissue response data to metformin revealed key genes in different tissues.In the process of data analysis,the current functional enrichment methods were still insufficient in characterizing gene sets,and the cell type detection algorithms for single-cell transcriptome data were complicated.The research content was mainly discussed in the first chapter.Then,the PFP toolkit was developed based on the R studio platform,and the biological pathways in the Kyoto Encyclopedia of Genes and Genomes database(KEGG)were downloaded to construct the base map of the gene network.The PFP toolkit mapped target differential gene sets into biological pathways and calculates scores for genes’critical connections in pathway networks.The PFP tool utilized gene connections in biological pathways to interpret gene sets,providing an open and easy-to-use toolkit in the Bioconductor community.The research content was mainly discussed in the second chapter.Finally,methods for automated annotation of cell types for single-cell sequencing transcriptome data were evaluated.Three automated annotation tools for cell types with two resolutions(cell cluster/single cell)were selected to construct six methods,and gold standard datasets were collected to evaluate indicators such as annotation accuracy,missed detection rate,and running time.Methods based on single-cell annotation resolution were found to perform better in the classification of immune-related cell subtypes.Among them,scmap-cell had the best annotation accuracy,and SingleR-cluster showed the most stable accuracy.SingleR used a small-scale reference set for annotation,which had obvious speed advantages.The performance of the automatic annotation method in different application scenarios was summarized and recommended.The research content was mainly discussed in the third chapter.Based on the analysis and application of transcriptome data,this study constructed the relationship between key genes and drugs found in the analysis process to provide reference for subsequent drug-related gene research.In terms of tool development,the biological pathway information characterized by PFP tools is more comprehensive,and the biological significance of differential gene sets can be more effectively focused and screened,providing a reference for the development of gene set interpretation tools.In terms of the recommended application of the tool,the existing automated annotation methods for cell types were effectively evaluated,and method recommendations for different application scenarios were provided.In general,a relatively complete transcriptome data analysis process was constructed,laying the foundation for subsequent drug target omics and inflammatory network analysis.

Keywords/Search Tags:

Transcriptome, Single cell transcriptome, Functional enrichment analysis, Biological response data, Automated annotation for cell type

Related items

1	Transcriptome Analysis Of Embryo Sac Component Cells And Cell-type-specific Gene Screening In Arabidopsis Thaliana
2	Single-cell Transcriptome-based Perturbation Effect Evaluation And Automatic Cell Type Identification
3	Data Analysis Method Of Single Cell Transcriptome Sequencing Based On Quality Control
4	Rare Cell Detection Methods Based On Single-cell Transcriptome Sequencing Data
5	Identification Of Platelets/Megakaryocytes In Single Cell Transcriptomic Data With The Assistance Of Machine Learning
6	Single-cell Transcriptome Anaylsis Of Two-cell Stage Mouse Embryos
7	Research On Denoising And Clustering Methods For Single Cell Transcriptome Data
8	Single Cell RNA Sequencing Reveals The Difference Of Transcriptomes In SCNT Embryos Drived From Different Somatic Cell Type
9	Comprehensive Analysis Of Omics Data For Plant Gene Structural Annotation And Functional Analysis Platform
10	Research On Cell Type Annotation Methods For Single-cell RNA-sequencing Data