| With the continuous development of artificial intelligence technology,the strategy of using machine learning algorithms to mine effective information from gene expression profiles is widely used in various fields of bioinformatics research.However,gene expression profiles are expensive to obtain and difficult to meet the needs of data-driven machine learning algorithms.To alleviate the impact of small sample size,Spectral Normalization Generative Adversarial Networks(SN-GAN)can generate high-quality simulated data to increase the sample size.However,the high dimensionality of gene expression profiles and the small number of samples lead to low realism and diversity of SN-GAN generated samples as well as insufficient sample-label correlation when generating survival data with complex labels.Therefore,in this thesis,SN-GAN is improved to achieve effective enhancement of gene expression profiles,and the main work is as follows:(1)To address the problem of insufficient authenticity and diversity of the generated gene expression profile classification data,a gene expression profile data enhancement method(CSN-GAN-LK)based on adaptive generation boundary and label resampling SN-GAN is proposed.First,to ensure that the generated data distribution is as close to the real samples as possible,CSN-GAN-LK uses a learnable output layer activation function to help the model determine the generation boundary adaptively,thus enhancing the authenticity of the generated data.Second,to enhance the diversity of the generated data,kernel density estimation is used to resample the labels and select continuous generated sample labels,which in turn expands the feature space of the generated data.The results on several publicly available gene expression profiling disease diagnosis datasets show that the CSN-GAN-LK method possesses higher generated sample authenticity and diversity compared with the samples generated by other related methods,enhancing the accuracy of gene expression profiling-based disease diagnosis tasks.(2)To address the problem of insufficient correlation between the generated gene expression profile survival data features and labels,a multi-task dynamic balanced SN-GANbased gene expression profile data enhancement method is proposed.First,a multi-task learning model(CSN-GAN-CPP)is formed using CSN-GAN-LK and Cox-nnet to effectively combine the risk assessment and sample generation tasks,and the missing label information is continuously added during the training process to improve the quality of the generated data.Secondly,a split-order learning mechanism is introduced in the training process and a dynamic equilibrium method is proposed to calculate the generator loss,which effectively solves the problem of high training fluctuations and difficult convergence of CSN-GAN-CPP.Experiments such as risk assessment and marker gene screening on three publicly available bladder cancer prognosis datasets show that CSN-GAN-CPP effectively predicts the risk probability of patients while generating high-quality gene expression profile survival data,and enhances the data enhancement effect of prognosis-related tasks.(3)Design and implementation of an oncology diagnosis and treatment system based on spectral normalization generating adversarial network.The system uses Vue to design the interactive interface,Java to implement the business logic,and Python to implement the training and inference of the model.The system contains user login module,data management module,data enhancement module,diagnosis and analysis module and auxiliary treatment module,which helps to make accurate diagnosis and analysis of patients’ diseases and formulate effective treatment plans in a timely and accurate manner. |