Font Size: a A A

Prediction Of Local Recurrence Of Head And Neck Cancer Unimodality Based On Small Sample And High-dimensional Gene Expression Data

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:H M YuFull Text:PDF
GTID:2404330614460749Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the increase of the types of cancer and the increase in the number of cancer patients,the research on cancer is continuously deepening.At the same time,due to the development of genomics,gene chips and gene sequencing technologies have gradually matured,and the use of gene expression profiles for the classification and prediction of cancer and the determination of targets have also increased.Head and neck cancer has become the sixth most frequent cancer in the world.It has a poor prognosis.Its five-year survival rate is less than 50%.It has a strong invasive,high metastasis rate and high postoperative recurrence rate.At present,there are few researches on the gene level of head and neck cancer.Therefore,research on the gene level of head and neck cancer is of great significance for the treatment and prediction of head and neck cancer.As the gene expression data are characterized by high dimensional and small samples,most of them are housekeeping genes,only a few are tissue-specific genes related to cancer.Gene screening is needed before the research of cancer development mechanism and establishment of prediction model.This paper proposes two feature selection methods for high-dimensional and small sample gene data.This article mainly researches and improves in the following aspects:(1)To solve the problem of imbalanced distribution of samples in cancer data,Stratified K-fold cross-validation is used in model training to ensure that the proportion of positive and negative samples in the training set and test set is consistent with the original data set.On the other hand,the average classification accuracy rate is used instead of the classification accuracy rate.Which pays the same attention to the minority class sample and the majority class sample,it is more sensitive to the performance changes of the minority class sample.(2)For the fisher score algorithm only considering the characteristics and categories of relevance,not considering the redundancy between the features.A maximum correlation minimum redundancy algorithm based on distance measurement and non-dominated theory is proposed.After the fisher score is used for sorting and pre-screening,the feature similarity is measured,the dominated features under the current feature are removed based on the non-dominated theory,operate the remaining features one by one,and finally get the feature solution set with the maximum correlation and minimum redundancy.Compared with the feature selection algorithm based on fisher score and approximate Markov blanket theory,it has higher search efficiency,and better classification performance of the selected feature subset.(3)Combine the multi-objective optimization algorithm to solve the feature selection problem.Feature selection is to retain the best feature set while minimizing the feature dimension,and the quality of feature set is evaluated by the classification performance of the model.The multi-objective particle swarm optimization algorithm is improved.For the unpredictable population existing at random initialization,a fisher-based population initialization optimization strategy is proposed in order to search for the global optimal solution as soon as possible after initialization.For the weak local search at the later stage of iteration the problem is that the ability is not strong,the particle speed update formula is improved,and a mutation strategy is implemented on the particles to jump out of the local optimum.Theoretical research and experiments have proven that this method has faster iteration speed,smaller feature dimensions,and better performance than other multi-objective optimization algorithms.
Keywords/Search Tags:small sample and high-dimensional, gene expression data, feature selection, non-dominated feature, multi-objective optimization
PDF Full Text Request
Related items