Font Size: a A A

The Research Of Similarity For LINCS Biological Data Via Metric Learning

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:2370330623451858Subject:Software engineering
Abstract/Summary:PDF Full Text Request
LINCS is a recently announced big data plan based on the response of typical human cells stimulated by small molecule compounds.The data is rich and regular,and the processing tools are perfect.Because gene expression is highly correlated,exploring the similarity of LINCS gene expression has important significance and reference value for gene inference,drug discovery,multi-group data fusion analysis,pathway discovery and so on.The GSEA algorithm is currently the mainstream algorithm for studying the similarity of LINCS data.It is necessary to predict the experimental results first and then perform the calculation comparison.Due to the complexity of the calculation process,the GSEA algorithm is difficult to satisfy the massive expression spectrum data in the similarity judgment and time overhead.Analyze the demand.The metric learning algorithm is based on learning.It is an ideal method for judging the similarity of expressions by learning the training data to obtain the appropriate metric space.It is currently a measure learning method for expressing spectral data,especially LINCS data similarity analysis.There are very few models.Based on this,this paper builds two different metric learning models based on the similarity between LINCS data.Besides,this paper also proposes a new LINCS data classification method to extend the application of similarity judgment.The main work includes:1.Gene expression profiling distance metric algorithm based on improved cosine distance.This paper first proposes a data extraction optimization method based on H5 py for LINCS data extraction,and then finds that the cosine distance is a suitable similarity calculation function.The cosine distance is improved in the next step,which makes the algorithm is more sensitive to values in each dimension by centering and normalization.A near-neighbor component analysis metric algorithm based on improved cosine distance constructed by combining with the NCA algorithm.It is verified on multiple datasets that the algorithm is a metric algorithm that is suitable for similarity analysis of the gene expression profile.2.Gene expression profiling distance metric algorithm based on deep learning.Based on the Siamese framework,this paper constructs a deep learning model combining DenseNet network and Cosine distance and expands the implicit metric learning.The loss function combined with Center loss and Cross-entropy loss is used to calculate the loss,and the model is improved while reducing manual intervention.The discriminative of the high-level feature expressions learned.A key point of this method is the data conversion process,which requires the gene expression profile to be converted into a gene matrix in advance.It is verified in the data of multiple groups of cell lines that the algorithm measures far better than the commonly used metric learning method and GSEA algorithm.3.LINCS data classification algorithm based on shared dictionary learning.In this paper,a shared dictionary learning model based on dis criminant projection is designed.When training the dictionary,the projection matrix is also trained,and the projection of the projection matrix to the test sample can widen the distance between different types of samples.In addition,the classificati on of all categories is obtained by sharing performance,and the discriminability of classification is improved.Finally,the distance between the reconstruction error and the mean vector is used to determine the class of the sample.It is verified by multiple sets of experimental data that the classification accuracy of this method is higher than the current mainstream classification method.
Keywords/Search Tags:LINCS, Similarity analysis, GSEA, Deep Learning, Metric Learning, Gene Expression Profile
PDF Full Text Request
Related items