Font Size: a A A

Using Convolutional Neural Networks To Identify Gene Expression Levels Based On Transcription Factor Signals

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y C YangFull Text:PDF
GTID:2480306509961299Subject:Physics
Abstract/Summary:PDF Full Text Request
In the human genome,gene expression levels and expression patterns show great diversity.Although it has been found that gene expression can be controlled by transcription factors(TF)and other regulatory factors,deciphering its complexity is still an arduous task.In eukaryotes,multiple transcription factors can synergistically combine with different cis-regulatory elements to regulate the expression level of target genes.Because TF interactions and its regulation of gene expression can only find on a limited scale by experiments,it is necessary to clarify the mechanism how TF combinations regulates gene expression.Based on the expression profile RNA-seq data of GM12878 and K562,the gene's RPKM values were calculated,thereby constructing the highly expressed gene(HEG)set and lowly expressed gene(LEG)set in each cell line respectively.Then,using the Ch IP-seq peak data of 74 transcription factors in GM12878 and 93 transcription factors in K562,the gene-TF matrixes of the HEG and LEG sets were obtained by calculating the transcription factor association scores(TFAS)of target genes in two cell lines.According to the person correlation coefficients(PCCs)values of transcription factor pairs derived from the gene-TF matrices,the TF interaction networks of the HEG and LEG sets were constructed by using Cytoscape.As a result,6 TF modules closely associated with gene expression level were found in each cell line.Next,based on the signal enrichment characteristics of TFs in these TF modules within a 10 kb window around a target gene,a TF signalbased convolutional neural network(TFCNN)model was constructed to identify gene expression levels.The results show that the TF signals of these TF modules can be used to better identify the expression level of genes.And the accuracy of the TF module related to gene expression is better than that of the shared TF modules between the HEG and LEG gene sets.The results showed that the average ROC-AUC values can reach up to 97.47% and 97.61% respectively.In addition,we also analyzed the influence of the size of the signal interval(bin)on the prediction of gene expression level.By comparing with other methods,it is found that the convolutional neural network model performed the best.
Keywords/Search Tags:Transcription factor, gene expression level, person correlation coefficient, convolutional neural network, prediction
PDF Full Text Request
Related items