| Gastric cancer is one of the most dangerous diseases in the world,and its death rate ranks fourth among all cancers,which is a serious threat to human health.Under the current medical conditions,early diagnosis of gastric cancer can greatly improve the survival rate of gastric cancer patients.Using techniques such as gene testing to identify effective gastric cancer biomarkers,such as nucleic acid biomarkers including genes and miRNAs(micro RNAs),is conducive to the early detection of gastric cancer.Due to the large number of miRNA and gene types in human body,it is not feasible to conduct biological experiments on each gene or miRNA to find biomarkers for gastric cancer.Therefore,we need to use statistical and machine learning methods to find effective gastric cancer biomarkers,and establish diagnostic models to verify the effect of gastric cancer biomarkers and establish the relationship between the biomarkers and gastric cancer.In this paper,we apply machine learning and neural networks to the bionomics data related to gastric cancer to explore the relationship between genes or miRNAs and gastric cancer,and identify effective gastric cancer biomarkers.The main work is as follows:(1)Search for gene markers of gastric cancer,use differential analysis and random forest methods for gene screening.On this basis,a neural network is established on training dataset and validated on an independent dataset to establish and validate the relationship between genes and gastric cancer.The experimental results show that the four genes INHBA,LYVE1,CD36 and COL10A1 are effective key genes for gastric cancer,and the gene screening algorithm and the method for establishing diagnostic models are effective,which provide a reliable algorithm framework for follow-up work.(2)On the basis of the above algorithm framework,in view of the possible problems in the above algorithm such as over fitting and incomplete feature selection,the RF-CS(Random Forest-Correlation Select)method and GCON(Gastric Cancer anti-Overfitting Net)network are constructed.For the feature selection process,the RF-CS method is constructed in combination with random forest and correlation coefficient,and LYVE1,INHBA,CD36,COL10A1 and CDH3 are selected as gastric cancer biomarkers.For the neural network modeling process,residual structure,regularization,and other schemes are used to optimize the network to solve the over fitting problem and improve network performance.Finally,a GCON network is constructed,independent data set tests,ablation experiments,and horizontal comparisons are conducted to prove its superiority.(3)Auto feature selection and auto machine learning methods are proposed while screening miRNA markers.Based on differential analysis and random forest screening,combined with wrapper method and architecture search,W-NAS(wrapper-Neural Architecture Search)method is proposed to synchronously perform auto feature selection and model construction,and various strategies are proposed to limit search time,including variable speed search.Finally,four miRNAs including mi R-5100 are identified as gastric cancer biomarkers.At the same time,a horizontal comparison is conducted to verify the superiority of the method,and the method is applied to gene data to compare with GCON,which proves the effectiveness of the method again. |