| Brassica napus is an important oil economical crop,which reach 35%-50%,and it is an important oil-bearing economic crop.The yield of rapeseed will be affected by various abiotic stresses.Therefore,it is an important scientific problem to find out the genes related to stress responsiveness,select new brassica napus varieties that can better adapt to the external environment,so that improve the yield of brassica napus.This paper uses penalized logistic regression model to identify crucial genes in responding to stress.The existing structured penalized logistic regression(SPR)model calculates the correlation weight between features and features,and between features and response variables based on Pearson correlation coefficients.However,Pearson correlation coefficient is not suitable for small sample data,and the Pearson correlation coefficient cannot calculate the correlation between qualitative and quantitative variables.Therefore,we have made two improvements to the SPR model:(1)Based on the harmonic curve,the feature is corresponding to the harmonic curve function,and the correlation between the feature and the feature is measured according to the distance between the harmonic curves;(2)The weight between the feature and the response variable is described based on the signal-to-noise ratio function.These two improvements not only eliminates the problem that the SPR model is not suitable for small samples,but also avoid using Pearson correlation coefficient to calculate the correlation between qualitative variables and quantitative variables.This article calls the improved model that structural penalized logistic regression model with harmonic curves and SNR function(H-SPR).In this paper,the H-SPR model is validated by simulation datasets and two cancer datasets.The results show that the H-SPR model is suitable for small sample data,and has good classification performance and high prediction accuracy.The H-SPR model was applied to the transcriptome data of brassica napus under five stress conditions.The results showed that:(1)the H-SPR model was not affected by the correlation between features,and the feature selection was stable;(2)H-SPR model can not only identify the crucial genes that can be identified by other models,but also identify the crucial genes that can not be identified by other models.The corresponding arabidopsis homologous genes have been confirmed to be responsive to stress conditions.Through enrichment analysis and literature comparison analysis,it is found that the identified Bna A06g37950 D,Bna C03g14320 D,Bna A01g16970 D and other genes may play a key role in the process of brassica napus stress response and can be used as targets for molecular breeding or transgenic research.The identification of crucial genes can provide target genes for molecular breeding,thereby cultivating stress-resistant varieties and increasing yields.The H-SPR model can be widely used in various high-dimensional data,especially biomedical omics data.It can effectively realize the classification,prediction and variable selection of high-dimensional data.The research in this paper is of great significance to biomedical research. |