Font Size: a A A

Imputation And Toxicity Effect Studies Based On Single-cell Sequencing Data

Posted on:2024-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z XiongFull Text:PDF
GTID:2530307142981769Subject:Software engineering
Abstract/Summary:PDF Full Text Request
There are many toxic substances in nature,and these substances will cause damage to organisms.The analysis of components can help people understand the toxicity and toxic effects of toxic substances,but it is difficult to clearly understand the mechanism of toxic effects of toxic substances inside the organism.The development and progress of biomedical technology and computer technology,the use of biomedical technology from the cell and gene level to further study the mechanism of action on biological organism,so as to suggest the toxic mechanism of toxic substances.Among these technologies,single-cell transcriptome sequencing technology has been a hot technology in the past decade,with numerous achievements in gene expression,cell phenotype,disease analysis and other research fields.Therefore,it is feasible and meaningful to apply this technology to the analysis of the mechanism of toxic substances on the body.In this paper,starting from the toxic substances themselves,the relationship between compound composition and toxicity was explored.Three machine learning algorithms--support vector machine,Random Forest and Light GBM were used to model 12 compound data sets with different components,so as to predict the toxicity of compounds.Then,single-cell sequencing data was used to study the mechanism of toxic substances on the body from the cell and gene levels.Due to the technical noise problem in single-cell sequencing data,especially the data missing,this paper aimed at this problem.Based on the negative binomial distribution model,DCA and Deep Impute method,an ND-Impute single-cell data missing value imputation(ND-impute)method is proposed,which utilizes deep learning algorithms to process data and integrates statistical models to impute missing value.To verify the validity of ND-Impute,five single-cell transcriptome common data sets were processed and downstream analyzed compared with two other deep learn-based imputation methods,DCA and Deep Impute.Finally,taking the toxic substance nicotine as an example to analyze the mechanism of toxicity of actual toxic substances,the effects of nicotine on human embryonic stem cells were analyzed by using single-cell transcriptome sequencing technology.First,the single cell sequencing data sets of the original nicotine-exposed and control embryonic stem cells were processed using the proposed ND-Impute method.Then,cluster analysis,pseudo-time series analysis and differential expression analysis were used to compare the changes in cell development and gene expression in nicotine-exposed and unexposed cells.The above studies showed that,in terms of the effects of toxic substances themselves,the toxicity prediction model built by three machine learning algorithms had an average accuracy of higher than 90% for the 12 groups of data sets.In particular,Light GBM algorithm had better classification prediction effect and could effectively analyze and predict the toxicity of toxic substances.Compared with DCA and Deep Impute,the proposed single-cell missing value imputation method ND-Impute can more clearly classify the five cell groups in cluster analysis,with better correlation values.In addition,ND-Impute is more accurate than DCA and Deep Impute,and the error is relatively low,indicating that this method is efficient in imputing missing values in single-cell data.Through the study on the effect of toxic nicotine on human body using ND-Impute method and single-cell data analysis,it is found that nicotine has adverse effects on human cells and genes.At the cellular level,the developmental process was affected.The number of cells in the nicotine exposure group was less than that in the control group in the clustering results,and the developmental trajectory of cells in the pseudo-temporal analysis was reduced.At the gene level,the nicotine exposure group and the control group had different gene expression,and the differentially expressed genes were obvious and the expression levels were significantly different.
Keywords/Search Tags:Toxic Effect, Machine Learning, Single-cell RNA sequencing, Imputation, Deep Learning
PDF Full Text Request
Related items