| At present,the bioinformatics research on the relationship between aging-related diseases and genes is mainly through the establishment of a machine learning multi-label model to classify each gene.Most of the existing methods for predicting pathogenic genes mainly rely on specific types of gene features,or directly encode multiple features with different dimensions,use the same encoder to concatenate and predict the final results,which will be subject to many limitations in the applicability of the algorithm.Possible shortcomings of the above include: incomplete coverage of gene features by a single type of biomics data,overfitting of small dimensional datasets by a single encoder,or underfitting of larger dimensional datasets.Firstly,this study uses the known gene disease association data and gene descriptors,such as gene ontology terms(GO),protein interaction data(PPI),Path DIP,Kyoto Encyclopedia of genes and genomes Genes(KEGG),etc,as input for deep learning to predict the association between genes and diseases.Our innovation is to use Mashup algorithm to reduce the dimensionality of PPI,GO and other large biological networks,and add new pathway data in KEGG database,and then combine a variety of biological information sources through modular Deep Neural Network(DNN)to predict the genes related to aging diseases.Secondly,the dimensionality reduction of multi omics data mainly depends on machine learning and the last layer of ordinary fully connected neural network.Neural network has strong expression ability,but it can not compare whether the original data and embedded data can express the same information.Variational autoencoders can compare the difference between the reconstructed and the original data through the steps of coding and decoding,learn a more accurate low-dimensional representation of the data,connect the dimensionally reduced multi omics data,and then use the neural network to learn and predict,so as to improve the performance of the algorithm.Experiments show that the proposed algorithm is superior to the existing modular neural network algorithm(improved from 0.8795 to the present 0.9153),gradient enhanced tree classifier(strong baseline method)and Logistic regression classifier.In this thesis,we finally learned the genes related to known diseases from the complex multidimensional feature space,and found the evidence of association between predicted genes and specific diseases with the support of literature. |