Font Size: a A A

Research On Breast Cancer Biomarkers Recognition Algorithm Based On Hybrid Ensemble

Posted on:2024-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z T YeFull Text:PDF
GTID:2544306944952349Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Globally,breast cancer has surpassed lung cancer to become the largest cancer in the world,and is the primary cause of female cancer attack and death.Determining reliable breast cancer biomarkers plays an important role in guiding disease diagnosis and discovering key genetic information.With the development of high-throughput technology,the establishment of a large number of bioomics databases such as the Cancer Gene Map(TCGA)has provided valuable resources for the study of biomarkers.As an effective tool for high-dimensional data processing,feature selection is widely used in the field of bioinformatics for the recognition of biomarkers.Existing feature selection methods face some difficulties and challenges in their application process.The small sample size and high dimensionality of gene expression data,as well as the data noise generated by complex experimental processes,can have varying degrees of impact on the reliability of gene expression data analysis.Such behavior may lead to insufficient generalization ability of the model,thereby affecting the results of data analysis.Therefore,this article designs and implements a feature selection algorithm based on hybrid ensemble,aiming to simultaneously improve the stability and classification performance of feature selection.By steadily combining the diversity of feature opinions,minimizing the dependence on data samples,and selecting a stable and reliable feature subset.At the same time,a theoretically supported interpretation of the process of stability measurement was provided,and an interpretable biomarker recognition strategy was established.Firstly,based on significance analysis,genes with expression differences are selected from the test samples to reduce interference from noisy data.Secondly,the stability measurement strategies for features in data sampling and randomness are analyzed,and based on the meaning and attributes of the stability estimation model,the optimal aggregation strategies for different models are proposed for the ensemble algorithm in this paper.Once again,using the concept of hybrid ensemble,a classification prediction model is established by simultaneously manipulating multiple feature subsets.A feature selection algorithm based on hybrid ensemble is constructed,which can fully utilize the knowledge possessed by different subsets in the process of providing reference decision-making,and achieve mutual complementarity of functions through adaptive weighting of stability measures.On the TCGA breast cancer data,by comparing the feature selection algorithm based on hybrid ensemble with other ensemble and traditional feature selection algorithms,the effectiveness of the algorithm is proved,and the stability of feature selection is significantly enhanced while maintaining or even improving the classification prediction ability.Finally,the optimal subset of biomarkers was selected through comprehensive evaluation indicators,verifying their biological significance.
Keywords/Search Tags:Feature selection, Ensemble learning, Stability measure, Breast cancer
PDF Full Text Request
Related items