Font Size: a A A

Research On Feature Selection Algorithm In Anticancer Drug Reaction Prediction Model

Posted on:2020-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q F SunFull Text:PDF
GTID:2370330596482652Subject:Control engineering
Abstract/Summary:PDF Full Text Request
A large number of clinical data indicate that different patients with the same kind of cancer often have different responses to the same treatment or drug due to the specificity of the patient's genes.Therefore,targeted drug therapy considering gene-specific effects has become an emerging cancer treatment.However,whether there are targeted genes for anticancer drugs in patients has a great influence on the therapeutic effect,and there are many challenges in the screening of clinical anticancer drugs: high experimental environment requirements,long waiting time and huge cost.The rapid development of bioinformatics,the genomic data and drug-related data of tumor cells are integrated into a large database,and combined with machine learning algorithms to predict drug response at the cellular level,providing a favorable basis for the screening of a variety of targeted anticancer drugs.Therefore,based on the tumor cell line gene expression data and the anticancer drug response value IC50,the feature selection algorithm is studied to establish a reliable and accurate anticancer drug response prediction model,mainly including the following three aspects of work:(1)By correlating the genomic data of cancer cell lines in CCLE with anticancer drug response data,it was found that the distribution of gene expression data was concentrated and the correlation with IC50 was more significant,which was more suitable for predicting drug response.Then,the hypothesis test p value of the correlation coefficient was calculated by combining the drug reaction value,and it was found that there was a group correlation between the genes selected by a certain threshold,which provided a basis for the research of the feature selection algorithm and the construction of the regression model in the subsequent work.(2)In response to the "dimension disaster" problem of genetic feature data,the evaluation screening for processing high-dimensional features was combined with the method of penalizing the least squares coefficient contraction.Firstly,the SIRS algorithm was used to calculate the IC50 of 21 drugs and the marginal metrics of each gene expression for preliminary screening,and the Pearson correlation coefficient hypothesis test was used for comparison;then combined with different methods of penalizing the least squares coefficient contraction: LASSO,Elastic Net and SCAD,for accurate feature selection,and providing effective predictors for establishing reliable regression prediction models.(3)Since the SIRS algorithm was more suitable for the general model,and Elastic Net takes into account the variable selection and group effect advantages of LASSO and ridge regression,the research mainly used the SIRS_Elastic Net method for feature selection,and compared with the Elastic Net algorithm model in the paper published by CCLE data.Then compare the effect of different combinations of screening and coefficient shrinkage on model results.Then the lung cancer cell lines were analyzed separately,and established regression prediction models for anticancer drugs,and the molecular biological function information of the genes was known by performing enrichment analysis on the predictive gene variables and their role in the signal pathway.Based on CCLE data,this paper predicted the response of 21 anticancer drugs,focuses on the feature selection algorithm in the regression model,and effectively combines the evaluation and coefficient contraction.SIRS_Elastic Net was used for feature selection,which improved the accuracy of model prediction.Most drugs had a coefficient of determination of more than 0.7,and the predictive model for lung cancer studies had a coefficient of determination of about 0.95.By enrichment analysis,it was found that the molecular function and pathway information corresponding to the predicted gene were biologically related to the anticancer drug,which provided a basis for the targeted genes of subsequent research drugs,and also contributed to the screening of new anticancer drugs.
Keywords/Search Tags:Anticancer drug response prediction, feature selection, Elastic Net, penalty least squares estimation
PDF Full Text Request
Related items