| Triple negative breast cancer(TNBC)is a subtype of breast cancer with high heterogeneity.Based on its molecular characteristics,accurate and stable classification of six molecular subtypes with unique expression and ontology is crucial for understanding the pathobiology of TNBC and providing personalized treatment.This paper proposes a classification framework that integrates mRNA and long non-coding RNA(lncRNA)gene expression data to effectively distinguish TNBC subtypes.The proposed gene selection algorithm DGGA combines the GeneRank score with the gene importance generated by deep neural network(DNN)in the process of gene scoring,which fully considers inter-class differences as well as gene interactions and effectively removes redundant genes.In addition,a gene similarity matrix is embedded in DNN for sparse learning,and the weight changes during network training are taken into account when obtaining the relative importance measurement of each gene.Finally,the genetic algorithm was used to simulate the natural evolution process,and the optimal subset of TNBC subtype classification can be found by continuously changing the gene combination.All performance evaluations in this study were performed using cross-validation methods,and the experimental results showed that the DGGA algorithm could identify fewer genes to obtain more accurate classification results in the TNBC classification task.The main innovations of this paper can be summarized as the following three aspects:1)A new framework was proposed to distinguish the six subtypes of TNBC,and this is one of the handful studies that completed the classification based on mRNA and long noncoding RNA(lncRNA)expression data.2)The DGGA gene selection algorithm is proposed.The algorithm makes full use of all genetic information and considers the effects of various factors on the classification results,including gene expression levels,intercategory discrimination,interconnection between genes,and classifier performance.Compared to other gene selection methods,the DGGA algorithm is able to identify more discriminative features and is more competitive in achieving higher classification performance with a smaller subset of genes3)The gene connectivity matrix embedded in DNN model can not only be used for sparse learning to prevent over-fitting training,but also can optimize feature selection performance in terms of gene importance measurement combined with weight changes. |