Font Size: a A A

The Research On Dimensionality Reduction Algorithms Based On Intermolecular Relationship

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:S N BaiFull Text:PDF
GTID:2480306509484654Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There is a lot of information closely related to life and health hidden in high-dimensional and complex biological data.Biological data often has the characteristics of small sample size and high dimension.Therefore,how to effectively reduce the dimension and extract important information is of great significance for disease diagnosis,drug research and development,personalized medical treatment,etc.Due to the complexity of the organism itself,there are complex interactions between molecules.In this dissertation,we intend to use the interaction relationship between molecules to extract the important information from the complex biological data from the two perspectives of feature selection and feature extraction.The main contents of this dissertation are as follows:1.A feature module search algorithm MSIG based on synergetic network is proposed.The interaction gain is used to construct the weighted synergetic network.In the process of searching for important modules,the edge aggregation coefficient and the edge weight are used to measure the aggregation degree of the candidate node and the current feature module.Furthermore,the network topology information is combined with the node's own classification performance to search for the rich information network module.The experiment results on 10 public datasets show that MSIG algorithm can effectively select modules with rich information and synergistic effect,and its performance is better than the feature selection methods based on molecular level and network level in most cases.2.A feature extraction algorithm VAMCN of variational autoencoder based on correlation network is proposed.In this method,the correlation network is constructed by using the spearman coefficient,and the importance of the feature is measured by its own variance and its neighbors' s variance in the network.Then the essential feature subsets are selected.In order to make the network structure sparse,a network layer is embedded between the input layer and the first hidden layer,which is composed of the spearman correlation network formed by the input data.In addition,we take the negative log-likelihood of multinomial distribution for the input data as the part of the reconstruction loss.Experimental results show that the VAMCN algorithm is better than VASC algorithm based on variational autoencoder and other commonly clustering algorithms,such as PCA,t-SNE,ZIFA.The two algorithms proposed in this dissertation are both feature dimensionality reduction methods for bioinformatics data.They effectively achieve dimensionality reduction of original high-dimensional data by considering the interaction between molecules.MSIG is a feature selection algorithm to identify biomarkers.VAMCN is a feature extraction technology to extract complex abstract features.
Keywords/Search Tags:Feature Selection, Variational Autoencoder, Molecular Relationship, Feature Dimension Extraction
PDF Full Text Request
Related items