Font Size: a A A

Prediction And Analysis Of Essential Genes Based On BP Neural Network

Posted on:2018-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:L XuFull Text:PDF
GTID:2310330536968695Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Essential Genes(EGs)are the genes indispensable for the survival of an organism.Identifying EGs correctly is of great significance to study the minimal conditions for the survival of a spieces,reveal the relationship between spieces during evolutionary process,and identify potential targets for antimicrobial drug.Identifying EGs with experimental methods is often expensive,time-consuming and is easily impacted by experiment environment.Besides,different experimental methods might yield different results.Therefore,there is a vast need for identifying EGs with computational methods.In the computational methods,Machine learning is a commonly used method.As one of machine learning mthods,BPNN has already shown good fault-tolerant abilities and a good ability in the treatment of complicated nonlinear problems,so it has been applied widely in actual production and in the filed of biology,bioinformatics.However,there were some limitations in identifying EGs,in the existed research based on ANN:(I)The organisms applied were limited.(II)The prediction methods were limited.(III)Previous researchers didn't apply methods to screen and study key features based on gene sequence.In this paper,BPNN models were applied to predict EGs following common pattern recognition process.The research objects were genomes downloaded from the NCBI and DEG database.The features included:(I).Sequence-based features.(II)Features generated by the method based on text classification.The main work was shown as follows:(1)Applying ANN to predict EGs based on 57 features.Genomes of 31 bacteria were used as analysis object and 57 sequence-based features were generated manually.Since the number of non-essential genes was much greater than that of EGs,two strategies were applied to reduce data imbalancing.Based on previous studies and theories,BPNN models were assigned appropriate parameters by repeating experiments.Besides,some commonly used modified training functions were used and compared.After determining the ANN model parameters,four experiment methods were carried out: 1)Self Prediction of Each Organism,2)Leave-One-Genome-Out method,3)Predicting All by One Organism,4)Self Prediction of All Organisms.Finally,results were evaluated by 10-fold cross validation and analyzed.(2)The study of key features screened by WPCA method.Based on the contributions to each features in Principal Component Analysis,we proposed to apply WPCA method to compute importance and select these features with maximum importance as key features.Through analysis,those features associated with the composition of genes were often more important.In this study,WPCA method reduced the features' number from 57 to 26 and the performance in the four experiment methods remained stable in general,which indicated it could screen key sequence-based features effectively.Besides,the prediction results after feature extraction remained fairly consistent,which denoted that there existed redundant features and the WPCA method could reduce the redundancy and reduce the time required to predict EGs.(3)Generating features based on text classification.Generating features manually is time-consuming and laborious.And it's very difficult to seek new featurs.Based on text classification,batch features could be produced and the steps were shown as follows: gene representation,selecting feature term,adding weights and so on.Finally,the features were applied to predict EGs with ANN in the four experiment methods.
Keywords/Search Tags:Essential Genes, Prediction, Neural Network, Feature Selection, Text Classification
PDF Full Text Request
Related items