Font Size: a A A

Research On The Method Of Prognostic Survival Analysis From Data With High Dimension And Small Sample Size

Posted on:2024-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiuFull Text:PDF
GTID:2530306932480524Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of medical technology in the past decades,prognostic survival analysis of cancer patients has become more and more important,and biomolecular markers have attracted a lot of attention in cancer research.Gene expression profile data of cancer patients is a kind of high-dimensional small sample data,and the analysis of gene expression profile data can discover molecular mechanisms as well as construct survival indicators,but there are still many problems in the research.First,traditional survival analysis methods are often only applicable to low-dimensional large sample data,and for high-dimensional small sample data,the insufficient sample size and high feature dimension make the traditional methods unable to be directly applied.Most of the existing cancer prognostic survival analysis methods focus on univariate regression analysis,which makes statistical methods less effective,and most of the reported biomolecular markers show weak ability in predicting patients’ survival risk.In addition,there is too much human involvement in the methods used to classify patients into prognostic risk categories,leading to some subjectivity in the results obtained.Finally,postoperative-related treatments also complicate the survival analysis.These above problems lead to the two issues of significant gene selection and prognostic category discovery not being well addressed,limiting the clinical application of studies concerning prognostic survival analysis of cancer.This paper aims to explore the prognostic survival analysis methods for high-dimensional small sample data.To address the problems of existing prognostic survival analysis methods in dealing with high-dimensional small sample data,this paper proposes a top-down gene selection method and a bottom-up gene combination method,respectively.The top-down gene selection method uses resampling and combines recursive and cumulative voting to obtain the important feature sets,which reduces the feature dimension and improves the accuracy and robustness of the model.The bottom-up gene combination approach fully enumerates the selected set of important features and uses a clustering algorithm to perform survival analysis on the sample classification to find the key features that are closely related to the prognosis of patients.Considering that the top-down gene selection method can reduce the number of feature dimensions but cannot find the combination of features closely related to patient prognosis survival,and the bottom-up gene combination method cannot perform multivariate regression analysis on the whole genome,this paper finally combines the two methods in a complementary way and proposes a combined top-down and bottom-up survival analysis method.The experimental results demonstrate the effectiveness of the top-down gene selection approach and the bottom-up gene combination approach on simulated datasets,respectively.The reliability of the proposed combined top-down and bottom-up survival analysis methods and the validity of the selected features were also verified by conducting experiments on glioblastoma data from two publicly available datasets,the Cancer Genome Atlas(TCGA)and the Chinese Glioma Genome Atlas(CGGA).Also,in the experiments on the clinical information of glioblastoma patients,a clinical attribute was found to be associated with the prognosis of glioblastoma patients.All these results suggest that the proposed combined top-down and bottom-up survival analysis methods are effective in prognostic survival analysis of high-dimensional small sample data.
Keywords/Search Tags:Feature selection, Expression profiles, Survival analysis, Glioblastoma
PDF Full Text Request
Related items