Font Size: a A A

Research On Hybrid Feature Selection Algorithm For Tumor Staging Diagnosis

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:M G LiuFull Text:PDF
GTID:2404330620472181Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In today's society,tumor is one of the diseases with the highest mortality.At present,there is no way to completely cure it.In clinical medicine,we can only check the degree of cure from the prognosis,and can't guarantee no recurrence.With the development of bio information technology and the improvement of medical level,more and more biomedical data can be saved depending on the development of computer storage technology and chip technology.Through computer technology,machine learning methods,data mining to mine useful medical data information,for accurate prediction of the degree of tumor deterioration,further treatment of tumor is the current research hotspot.Human life mechanism has a complex regulatory mechanism,each life activity has thousands of genes,each different life process has different intermediate products.Common data for these processes include methylation histochemistry data,transcriptome data,proteomics data,etc.This paper uses the data of the first two kinds of histology.In vivo,methylation undergoes heavy metal modification,gene expression process control,protein function control and RNA processing after enzyme catalysis.Transcriptome is the product of DNA transcription,which is used to study the production and types of RNA in specific cells and organs.At present,many studies have shown that the data of methylation and transcriptome are closely related to the production and development of tumors.It is very important to study these data for tumor staging.However,in terms of tumor histochemistry data,it usually has the characteristics of small sample number,large number of genes,that is,"large P small n" distribution,which has a huge challenge for direct modeling.If such data is directly used for modeling,not only the complexity of the model is too high,the execution efficiency is low,but also the problem of over fitting and poor prediction performance.Reducing the dimension of data and removing the redundant and useless features is an effective way to solve this kind of problem.In this paper,according to the methylation and transcriptome data of breast cancer,two feature selection methods and strategies are studied for different problem types,and a series of studies on breast cancer staging are carried out.In the first method,we study the three-step integrated feature selection method of "lr_backfs" based on the continuous relationship between tumor stages,combining the classification and regression methods.First,we use lasso regression to sparse the original data,select the feature whose parameter weight is not zero,and then use the recursive feature to eliminate,delete the feature with small contribution to the prediction model Finally,use backfs to remove redundant features.Finally,259 features are selected from more than 550000 features for multiple sets of data.Opt(ACC * R2)is 72.24%,accuracy(ACC)is 95.07%,R2 is 75.98%.In the second method,we study the four-step hybrid feature selection method of "LMRFELRSOR",which adopts "OR" strategy and "LMRFELRS" method for feature selection."LMRFELRS" method is divided into four steps.Firstly,regularization is used to sparse the original training data,and non-zero features are selected.Secondly,penalty term based and tree based feature selection strategies are used.Secondly,recursive features are used to delete features of low importance.Finally,the improved heuristic search strategy of increasing L and removing R is used to remove redundant features."OR" strategy is mainly through the selection of samples with similar number of samples for secondary classification,and the final result summary method.Finally,we select 173 features out of 550000 features,and achieve the prediction accuracy of 99.09%.
Keywords/Search Tags:Tumor staging, machine learning, hybrid feature selection, regression and classification
PDF Full Text Request
Related items