Font Size: a A A

Prediction Of Fertility Protein Based On Machine Learning Methods

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:J S WangFull Text:PDF
GTID:2480306764969289Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Proteins are involved in different life activities.They play important roles in various biological processes,such as fertility activities.These activities include spermatogenesis,oogenesis,embryogenesis and other differentiation processes,such as organogenesis.These life activities are regulated by many proteins.During spermatogenesis,peritubular myoid cells use the contractile filaments of actin and myosin to push the testicular fluid containing immobile sperm into the testicular reticulum,and participate in the production and maintenance of blood testicular barrier by secreting fibronectin,collagen and proteoglycan.In oogenesis,the expression and function of TATA binding protein 2(TBP2)will affect the progress of early follicular development.In embryogenesis,cell adhesion protein(E-cadherin)plays an important role in the compaction stage and the polarization stage.Due to the importance of fertility-related proteins,we are going to build a classification model to classify and predict these three proteins.These three proteins are spermatogenesis-related proteins,oogenesis-related proteins and embryogenesis-related proteins.Firstly,we collected the same protein data as the previous research and we used the feature extraction method based on amino acid composition and physical and chemical properties to extract its features.Then,the features are selected by analysis of variance(ANOVA)and incremental feature selection(IFS).Finally,the random forest(RF),extreme gradient boosting algorithm(XGboost)and support vector machines(SVM)were used to establish model for the classification of these proteins.When constructing the classifier,we found through the results of enrichment analysis that the enrichment pathway of proteins related to oogenesis may be enriched into the pathway of proteins related to embryogenesis,which indicates that the two proteins may have similarities in characteristics,which may worsen our classification prediction results.Thus,we are going to build a two-layer classifier.The first layer classifies embryogenesis-related proteins and other two types of proteins(spermatogenesis-related proteins and oogenesisrelated proteins),and the second layer classifies spermatogenesis-related proteins and oogenesis-related proteins.By comparing the evaluation results,the SVM-based model with good performance was selected as the final model for classification.Finally,we get a two-layer SVM-based model to classify three fertility related proteins.The ACC and area under ROC curve of the first layer classifier on the independent test set are 81.95% and 0.89,respectively.The ACC and area under ROC curve of the second layer classifier on the independent test set are 84.74% and 0.90,respectively.Through enrichment analysis,we found that spermatogenesis-related proteins were mainly enriched in GO:007283(spermatogenesis)pathway,oogenesisrelated proteins were mainly enriched in GO:0048477(oogenesis)pathway,and embryogenesis-related proteins were mainly enriched in GO:0001701(in utero embryonic development)pathway.Then,we combined these three proteins and made enrichment analysis.We found that the embryogenesis-related proteins will affect the enrichment of oogenesis-related proteins,and the enrichment of spermatogenesis-related proteins is not affected by the other two proteins.
Keywords/Search Tags:Fertility Related Proteins, Machine Learning, Spermatogenesis, Oogenesis, Embryogenesis, Enrichment Analysis
PDF Full Text Request
Related items