Font Size: a A A

A Study Of Joint Multi-omics Analysis Based On Clustering Analysis And Integrated Learning

Posted on:2024-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:S P HuangFull Text:PDF
GTID:2543307160479664Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Carnations are highly ornamental and their senescence is mainly petal-based.There is little research on the genes that regulate carnation senescence and the molecular mechanisms behind it are poorly understood.Since carnations are sensitive to ethylene,this study was conducted to explore the genes and the best metabolite combinations that regulate carnation senescence using ethylene to catalyse senescence in carnations,in order to provide a reference to reveal the mechanism of ethylene regulation of carnation senescence.In this study,data from two carnation varieties,Master and Whitesnow,from the School of Horticulture and Forestry of Huazhong Agricultural University were applied to the results of traditional differential gene screening methods,and multiple clustering methods and classification algorithms were attempted for joint multi-omics analysis,with the following three main tasks.(1)Control of data quality.This study applied principal component analysis methods and partial least squares discriminant analysis to observe the natural clustering trends among the samples and to observe the data as a whole,from which the clustering trends of the data as well as outlier points,etc.were derived,and the apparently abnormal outlier points were identified or eliminated.Principal component analysis is used in the genetic data quality control process as it is a better way to demonstrate the natural clustering trends in the data set.(2)Selection of differential genes.The genetic data were first screened by combining the log2 FC values calculated by the ploidy change method and the VIP values calculated by the orthogonal partial least squares discriminant analysis method to obtain the set of differential genes related to ethylene regulation of petal senescence in carnations;and then compared with the set of differential genes screened by four clustering methods: fuzzy Cmeans clustering,k-means clustering,clustering around centroids and hierarchical division clustering The results showed that fuzzy C-mean clustering was more effective in selecting the differential genes.(3)Selection of the best metabolite combinations.Firstly,the differential gene sets obtained by the ploidy change method combined with orthogonal partial least squares discriminant analysis were used as the benchmark,and the labels of the screened differential genes were set to 1 and the labels of the unscreened genes were set to 0,resulting in a data set with gene category labels.Then,the data sets were divided based on the leaveout method and the ten-fold cross-validation method respectively,and the data with labels were classified using random forest,support vector machine,logistic regression,XGBoost and Gaussian Bayesian classification models to obtain the metabolite combinations with label values of 1 under each model.Finally,the screening effects of the metabolite combinations were compared and analysed in terms of six dimensions: ROC curve,AUC value,accuracy,precision,recall and F1-Score.The experimental results showed that random forest was the most effective method for the selection of the best metabolite combination under both data set partitioning methods.
Keywords/Search Tags:Differential Genes, Optimal Metabolites, Principal Component Analysis, Fuzzy C-Means Clustering, Random Forest
PDF Full Text Request
Related items