Study On Soybean Characteristics Of Different Origin Based On Integrated Analysis Of Multi-omics Data

Posted on:2024-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Zheng

Full Text:PDF

GTID:2543307121495164

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Soybean,as one of the important global crops,is crucial to consumers and the trade market based on its quality and origin.Therefore,it is necessary to establish an accurate and reliable identification method for soybean origin.How to search for biomarkers and how to screen and analyze multiple data sources is the current urgent problem to solve for soybean origin identification.In this study,we collected 108 samples from different Soybean main producing areas in Heilongjiang and Liaoning provinces as research objects and obtained metabolomic and transcriptomic data of soybean samples using non-targeted liquid chromatography-mass spectrometry(LC-MS)and Illumina sequencing,respectively.In order to tackle the dimensionality catastrophe problem in the field of sample metabolomics data analysis,we used a strategy combining various feature selection methods with multi-omics integration analysis methods to screen out the optimal feature subset as stable and accurate biomarkers for soybean origin identification.Based on the linear logistic regression feature selection method and four feature selection algorithms: L1-regularized logistic regression(L1-LR),Recursive Feature Elimination(RFE),Incremental Feature Selection(IFS),and Sequential Backward Selection(SBS),we established a model to select the optimal feature subset.Meanwhile,we used the combination of the Correlation Bias Reduction strategy(CBR)and LR-RFE method to optimize the selected model,which enhanced the reliability of these features as biomarkers by reducing the bias and improving the correlation between them.We aimed to provide reliable feature selection methods and biomarker selection basis for soybean origin identification by comparing and analyzing the classification accuracy and feature quantity of the model.This will help to improve the accuracy and reliability of soybean quality identification and meet the needs of consumers and the trade market.Main conclusions of the study are as follows:(1)The feature selection method based on linear logistic regression and the multi-omics integration analysis strategy can be applied to the origin identification of soybean in Heilongjiang and Liaoning Province.Compared to the use of monoomic data,the accuracy of the integrated analysis strategy for multiomics is more prominent,and the classification performance of the interim integration analysis method is slightly higher than the preliminary integration analysis method.(2)LR-RFE + CBR algorithm can effectively reduce the model analysis process,vulnerable to the possible correlation between features,the LR-RFE + CBR algorithm optimization after combining a single omics data or using multi-omics integration strategy model classification performance have significantly improved,based on interim integration analysis method model the highest accuracy,reached 99.83%.(3)The linear combination feature selection method based on logistic regression shows high model performance when compared with the filtering feature selection method.The combined feature selection method achieved at least 0.97% model performance either on single group data or multiple group integrated data.When compared with the package feature selection method,the combined feature selection method can significantly reduce the number of features selected by the L1-LR feature selection method.This helps to improve model performance and proves that backward feature selection methods can effectively remove redundant features in soybean omics data.This approach also helps in improving the model performance.Therefore,linear combinatorial feature selection methods based on logistic regression have great potential in the analysis of soybean omics data.(4)Pathway analysis was used to verify the optimal subset of features selected by the model optimization combined with the interim integration analysis algorithm.This optimal subset of features contains 33 transcriptomic features and 12 metabolic features.Through the pathway analysis approach,we can conclude that these features are clearly associated with each other.Therefore,these features can be used as biomarkers to distinguish between Heilongjiang and Liaoning provinces.It has important scientific significance and practical application value for the study of soybean biommarkers in different origin.

Keywords/Search Tags:

transcriptome, metabolome, multi-omics integration analysis, feature selection, origin identification

PDF Full Text Request

Related items

1	Network Integration Analysis Method Based On SmCCNet And Its Application In Sweet Potato Multi-omics Dat
2	Genome Assembly And Multi-omics Data Integration Of SilkDB 3.0 For Bombyx Mori
3	Multi-Omics Analysis Of The Integrated Transcriptome-Metabolome-Rhizosphere Microbiome Regulatory Model During The Development To Senescence Of Rhododendron
4	Multi-Omics Analysis Of The Predation Mechanism Of Arthrobotrys Conoides On Bursaphelenchus Xylophilus
5	Multi-Omics Analysis Of The Mechanism Of Pollen Abortion In ’Hua Nong’ Seedless Ponkan
6	The Multi-omics Research Of The Liver For Yaks Infected With Fascioliasis
7	Molecular Mechanism Of Tibial Dyschondrodysplasiatibial In Broilers Based On Transcriptome,Proteome And Metabolome
8	A Multi-omics Study Of Herbaspirillum Huttiense 5-28 Inducing Resistance To Pst DC3000 In Arabidopsis
9	Study On The Developmental Mechanism Of Zanthoxylum Bungeanum Prickle Based On Morphology And Multi-Omics Analysis
10	Multi-Omics Analysis Reveals The Regulation Mechanism Of Fattening And Intramuscle Fat Deposition In Holstein Steer