Font Size: a A A

Gene Expression Prediction Based On BP And LSTM Neural Networks

Posted on:2020-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2370330596985807Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of life science and computer science,biological data has grown exponentially,which greatly enriches the data resources of bioinformatics in terms of quality and quantity,and provides a data foundation for unlocking the mysteries of life.As a major breakthrough in the field of molecular biology,gene chip technology has been applied to the measurement of gene expression levels,providing great convenience for exploring the nature of life and becoming one of the important tools for exploring the mystery of life.The gene expression profile is based on the gene chip,and the complementary sequence is labeled by a probe by a hybridization sequencing method.According to the difference of gene expression under different conditions,gene expression profiling can be used for environmental detection and prevention,drug screening,gene function discovery,complex disease diagnosis,personalized treatment,crop optimization and forensic identification.Therefore,studying gene expression profiles has important theoretical and practical significance.Although the cost of acquiring genome-wide expression profiles is gradually declining,the generation of thousands or even tens of thousands of gene expression profiles based on gene chip technology is not only complicated in biological processes,but also the general laboratory cannot afford this high cost.NIH LINCS researchers analyzed about 1,000 carefully selected landmark genes and relied on linear regression to speculate on the expression of remaining target genes.However,the method of predicting gene expression by linear regression often ignores the nonlinear characteristics of gene expression profile data,so it is impossible to accurately predict gene expression.The BP neural network can extract relatively complex nonlinear mapping between input and output data.The LSTM neural network can capture the interaction between input data,and the combination of the two neural network can easily extract advanced feature representation from the original data.In general,most gene expression profiling usually have the characteristics of small samples and high dimensionality,so it is very easy to over-fitting using a deep learning algorithm to fit gene expression profiling data.In order to solve the above problems,BP and LSTM neural networks are used to extract the nonlinear characteristics of gene expression profiling data.At the same time,the Transfer Learning strategy and regularization technology are introduced,which effectively solves the problem that the deep learning algorithm is easy to over fit on small data sets.In view of this,this paper carried out a gene expression prediction study based on DCIO-BP and LSTM.The research contents are as follows:(1)In view of the high dimensionality of the original gene expression profile and the existence of redundant genes and irrelevant genes,this paper uses the unsupervised clustering algorithm K-means to duplicate removal the original gene expression profile data.In order to eliminate the change of expression caused by experimental techniques,and make the data of each sample and the parallel experiment at the same level,this paper will standardize and normalize the duplicate removal data,and make data for the construction of regression prediction model.ready.(2)For the traditional linear regression method to predict gene expression,ignoring the nonlinear features between input and output data,this paper uses BP neural network to automatically extract the nonlinear features between landmark gene and target gene,and then combine the input.The direct connection method to the output adds the linear feature between the input data and the output data to the prediction model,and comprehensively considers the linear and nonlinear features between the landmark gene and the target gene to improve the prediction ability of the model.(3)In order to improve the accuracy of gene expression regression prediction,this paper uses the LSTM neural network's gate unit to capture the long-term dependence information of the landmark gene,and combines the gene expression regression prediction model proposed in Chapter 3 to predict the expression of target gene.By introducing the Transfer Learning strategy and regularization technology,the problem that the deep learning model is easy to overfit when fitting small data sets was solved,and the cross-platform prediction ability of the regression prediction model is improved.
Keywords/Search Tags:Gene expression profiles, BP neural network, Transfer Learning, Regression prediction, LSTM neural network
PDF Full Text Request
Related items