Background:High-grade Serous ovarian cancer is the most lethal primary malignancy of female reproductive system.Surgery combined with chemotherapy centering on cisplatin is the first-line treatment for ovarian cancer,though around 80 percent of patients relapse within 2 years and the 5-year survival rate is only about 30%.It is known at present that platinum resistance is the leading cause of recurrence and death in patients with ovarian cancer.Therefore,accurate prediction of the efficacy of chemotherapy and exploration of the mechanism will provide clues for carrying out ovarian cancer precision treatment and the research and development of targeted drugs.In recent years,with the development of computer technology,electronic engineering and statistics,AI has made breakthroughs in solving complex problems in the medical field.Among them,Convolutional Neural Network,as a typical representative of AI,can automatically learn feature expression from massive medical images,showing great advantages in disease diagnosis,classification,prognosis prediction and other domains.However,the present CNN model has a "black box" dilemma,that is,lack of relevant schemes to explain the causal relationship of the model.This is a problem that has long plaguing deep learning,and it also makes it difficult for such systems to be trusted and accepted by doctors.Wherefore in the aspect of ovarian cancer research,the deep learning LASSO model of chemotherapeutic resistance based on pathological images has been successfully constructed,and the correlation between pathological omics characteristics and tumor microenvironment components has been innovatively analyzed through the integrated model.Methods:1 、 Data download: WSI(Whole Slide Image,full field digital section)and corresponding clinical information of ovarian cancer samples were obtained from TCGA database,including pathology omics,survival status,survival time,and chemotherapeutic resistance information,etc.2、 Image preprocessing: the WSI was cut into 300 * 300 pixel color blocks,with uninformative color block being deleted.3、Deep learning: According to the clinical information of patients,constructed cancer and non-cancer as well as drug resistant and sensitive CNN model based on the categorizes of normal,drug resistant and drug sensitive respectively.Meanwhile,1024 dimensional histological features were extracted from the color block.4 、 Machine learning model construction and validation: used five kinds of ML algorithms called Lasso(LA)、 Adaboost(AD)、 naive Bayes(NB)、XGBoost(XG)and randomforest(RF)to optimize CNN classification mode and determine the optimal machine learning model,and obtain machine learning score according to the model formula as well as intensively evaluate the generalization performance of the model in the validation and test for data sets.5 、 Clinical factors analysis of marking chemotherapeutic resistance: 327 TCGA HGSOC patients’ corresponding clinical data(age,pathological grade and stage)was obtained.Calculated the chemotherapy resistance score of each HGSOC patient according to the optimal model formula and then divided the patients into resistant group or sensitive group,analyzing the relationship between and clinical characteristics and them.6、Predictive mechanism analysis of the optimal model: the relationship between the immune tumor microenvironment in 327 HGSOC patients with TCGA and Chemotherapy resistance score.7 、 Correlation analysis of key features and lymphocyte infiltration: Correlation analysis between each feature and lymphocyte infiltration in optimal model formula.Selected key features and performed pathological analysis and validation.Results:1、CNN extracted histological features from different types of color blocks: divided the 327 TCGA patients into 90 training sets(30 normal tissues,30 chemotherapy sensitive and 30 chemotherapy resistant)and 237 verification sets(43 normal tissues,139 chemotherapy sensitive and 55 chemotherapy resistant),the classification area under the working characteristics curve(AUC)of subjects of cancerous and non-cancerous color blocks was 0.995;the classification area under the working characteristic curve(AUC)of subjects of color blocks distinguishing between chemotherapeutic sensitive and chemotherapeutic resistance was 0.662.2 、 Machine learning model accurately identified tumor regions: performed DHFs variation analysis between ovarian cancer and non-cancer WSI groups,and obtained738 significantly different DHFs.Five ML algorithms were used to construct the classification model,and it turned out to be that five ML algorithms all had high performance in identifying tumor WSI(AUC> 0.90).Among them,the LA model showed the best classification performance,with an AUC of 0.993.3 、 LASSO regression constructed a best machine learning model for chemotherapeutic resistance prediction:performed DHFs variation analysis between chemotherapeutic sensitivity and chemotherapeutic resistance WSI groups,and obtained 85 significantly different DHFs.Of the five machine learning algorithms,the LA model had the best overall performance.In the validation group,the AUC value was 0.760,and in the independent test data set,the AUC value was 0.746.4、The relationships between score and clinical characteristics: Clinical data(age,pathological grade and pathological stage)was collected from TCGA database,the study showed that patients were divided into resistant and sensitive groups according to the LA model machine learning score,and the grouping of patients was significantly associated with tumor grade(chi square =10.644,p=0.014)and pathological stage(chi square =11.008,p=0.012).Multiple cox analysis(univariate and multifactorial)indicated that Score was associated with OS and DFS in TCGA patients(HR> 1,p <0.0001).Survival analysis showed that patients in the resistant group had worse both OS and DFS than those in the sensitive group(p <0.05).5、The relationship between Score and tumor microenvironment: Based on correlation analysis,the results showed a significant negative correlation between score as well as silent mutation rate and SNV(Single Nucleotide Variant,single nucleotide variant)neoantigens.6、Characteristic TZ0279 was associated with lymphocyte infiltration in the patient color blocks: the results showed that TZ0279 was positively correlated with the lymphocyte infiltration fraction.We compared the TZ0279 values of WSI in resistant and sensitive patients and found that TZ0279 machine learning score values were significantly higher in sensitive patients than in resistant patients.The pathologist randomly selected 180 color blocks from the clinical samples to calculate the number of lymphocyte infiltrates.The result showed that the machine learning score value of TZ0279 was positively correlated with the number of lymphocyte infiltrates.Conclusion:1.Used CNN to construct two neural networks based on pathological images.They can distinguish tumor and non-tumor tissues and predict chemotherapeutic resistance and sensitivity of ovarian cancer respectively.And extracted 1024 histological feature values in the color blocks.2.By comparing the five machine learning models,we found that the LA models showed the best classification performance.And the score was calculated according to the LA machine learning model formula,then patients were divided into resistant group and sensitive group.3.Survival analysis showed that the patients in the resistant group had both worse OS and DFS than those in the sensitive group(p <0.05).4.Significant negative correlation between machine learning score and silent mutation rate and SNV neoantigen.5.The machine learning score data of TZ0279 was positively correlated with the number of lymphocyte infiltrates. |