| Background and objective: Currently,the diagnosis of hepatocellular carcinoma(HCC)based on the evaluation of microscopic histopathological slides is still indispensable in clinical practice.Evaluation of the microscopic histopathology slides by experienced pathologists can help the pathologists understand the biological characteristics of HCC.Various well-established histopathological patterns,including grading by tumor cellular differentiation(histologic grade)and immune cell infiltration,have been proved to be associated with the prognosis of HCC patients.Meanwhile,the histopathological evaluation of hepatocellular carcinoma can not only guarantee a definitive diagnosis,but also provide significant biological information,as the histological subtypes of HCC have been shown to be related to their genomic characteristics.It has been proved by clinical trials that there is an association between the occurrence of activating mutations and the response to immunotherapy or tyrosine kinase inhibitors(TKIs)in HCC patients,suggesting the potential role of HCC histopathological subtyping as a biomarker for TKIs or immunotherapy.On the other hand,it has been reported that HCC patients’ response to immunotherapies was correlated with pre-existing tumoral and peritumoral immune infiltration,indicating the density of immune infiltration may provide evidence for predicting prognosis after immunotherapies.However,since HCC remains a most dominant cancer burden over the past decades in China,it is time-consuming for a pathologist to evaluate thoroughly to interpret the complex of pathological images of HCC.Moreover,the high intra-tumor heterogeneity might make it hard to analyze the morphological characteristics only through visual inspection.Recently,the advent of digital wholeslide imaging(WSI)data has offered great opportunities for computer-aided diagnostic(CAD)technologies,which has been shown to improve efficiency,accuracy,and consistency in histopathological analysis.However,the utility of CADbased histopathological analysis in HCC has been rarely reported.In this study,we aim to develop a fully automated pipeline based on HCC histopathological images to achieve automated diagnosis of HCC,survival outcome prediction and biological feature prediction using machine learning methods.This study consists of three chapters.In the first chapter,we extracted quantitative image features from HCC histopathological images and built machine-learning based models using extracted features for the diagnosis of HCC and predicting patients’ survival outcomes after surgical resection.In the second chapter,we constructed a deep learning-based platform using digital slides of HCC to realize automatic diagnosis of HCC and prediction of somatic mutations.In the last chapter,we developed and validated a radiomic signature(Rad score)of immune infiltration in HCC digital slides by using radiomic data extracted from contrast-enhanced computed tomography images,which might help us understand the immune landscape of HCC in a non-invasive way.Materials and Methods: In the first chapter,491 whole-slide hematoxylin and eosin(H&E)-stained histopathological images of HCC tissues(frozen section)from 376 HCC patients, including 402 HCC slides and 89 matched adjacent normal tissue slides,were obtained from The Cancer Genome Atlas(TCGA).Patients with clinical data were randomly partitioned into a training set(70%)or a test set(30%).To challenge the trained algorithms for slide classification and prognostication,tissue microarray(TMA)images from 269 patients from West China Hospital(WCH)were acquired as an external validation set.Using the Bio Formats Package,each slide in.svs format was captured at 20× magnification and tiled into overlapping 1000×1000 pixels.The image feature extraction pipeline for image tiles was constructed using Cell Profiler.A supervised classification between HCC and adjacent normal tissues were performed using Breiman’s random forest.The construction of overall survival-prediction model was developed using random survival forest(RSF).In the second chapter,two datasets of H&E-stained digital slides were collected in our studies:(1)WSIs of HCC from TCGA,which includes 481 WSIs with the matched whole exome sequencing(WES)data(2)TMAs from The Biobank of West China Hospital,which contains 719 TMA dots with 78 matched WES data.For the construction of deep learning models,we used 80% of the WSIs in TCGA dataset for training and 20% for testing.TMAs from The Biobank of West China Hospital were used the external validation set.To reduce computational time,we used Open Slide library to extract each WSI at magnifications of 5× and 20×,which was then tiled into non-overlapping 256×256 pixels.Generally,the goal of our study consists of two parts: one is to automatically distinguish HCC from adjacent normal tissues(task 1),and the other is to predict the somatic mutations of HCC(task 2).For each task,the prediction probability of each slide(or TMA dot)was generated using two methods:(1)averaging of the probabilities of tiles from the corresponding slide;(2)summarizing the percentage of positively classified tiles from the corresponding slide(≥0.5).In the last chapter,142 HCC patients(n = 100 and n = 42 in the training and validation sets,respectively)were subjected to radiomic feature extraction.Imaging features and immunochemistry data of patients in the training set were subjected to elastic-net regularized regression analysis to predict the level of CD8+ T cell infiltration.High or low CD8+ T-cell infiltration was determined by stratifying all the patients into two groups based on their median value of CD8+ T-cell density.The median value of Rad score in the training cohort was used to cluster patients into high score(greater than the median value)or low score(less than or equal to the median value)groups.Results: In the first chapter,a total of 1733 quantitative image features were extracted from each histopathological slide.The diagnostic classifier based on 31 features was able to successfully distinguish HCC from adjacent normal tissues in both the test [area under the receiver operating characteristic curve(AUC)0.988,95% CI: 0.975-1.000] and external validation sets(AUC 0.886,95% CI: 0.844-0.929).The random forest prognostic model using 46 features was able to significantly stratify patients in each set into longer-term and shorter-term survival groups according to their assigned risk scores.Those patients with higher risk scores had poorer survival outcomes compared with those with lower risk scores in both the test(log-rank P = 0.027)and external validation sets(log-rank P = 0.013).Moreover,the prognostic model showed comparable predicting accuracy as TNM staging systems in predicting patients’ survival at different time points after surgery.In the second chapter,the performance of the CNN classifier on WSIs turned out to be almost error-free in task 1,with the highest area under the curve(AUC)achieved at 1.000 using WSIs.This model was then validated on TMAs from WCH dataset,with the highest AUC reaching 0.971.The results of task 2 showed that 7 of these genes,including ALB,CSMD3,CTNNB1,OBSCN,TP53,MUC4 and RYR2,can be predicted from WSIs in TCGA dataset using our mutation-prediction CNN,with the AUC ranging from 0.709 to 0.903.Moreover,4 of 7 predictable genes in TCGA dataset-ALB,CSMD3,OBSCN,and RYR2-could also be predicted(with AUCs > 0.7)using TMAs from WCH dataset.In the last chapter,a Rad score for CD8+ T-cell infiltration,which contained seven variables,was developed and validated in the validation set(area under the curve(AUC): training set 0.751,95% CI 0.656-0.846;validation set 0.705,95% CI 0.547-0.863).The decision curve indicated the clinical usefulness of the Rad score.A higher Rad score correlated with superior overall and disease-free survival outcomes(P = 0.012 and 0.0088,respectively).Using the histopathological slides,we found that the Rad score positively correlated with the percentage of tumor-infiltrating lymphocytes(TILs;Spearman rho = 0.51,P < 0.0001).Moreover,the Rad score could also discriminate inflamed tumors from immune-desert and immune-excluded tumors(Kruskal-Wallis,P < 0.0001),and higher Rad scores could be found in patients with positive programmed cell death ligand 1 expression in tumor/immune cells,as well as those with positive programmed cell death protein 1 expression.Conclusion: In the first chapter,we demonstrated the diagnostic and prognostic value of quantitative image features in HCC patients by using machine-learning methods.The models we constructed can distinguish tumors from adjacent normal tissues with excellent performance,and demonstrate a comparable prognostic performance as the TNM staging system in predicting the OS outcomes of HCC patients.We believe that this automatic pipeline can free pathologists from dull repetitions in making a pathological diagnosis of HCC and facilitate prognostic prediction from their perspective.In the second chapter,deep-learning methods based on H&E-stained digital slides had been proved to be a promising tool for HCC diagnosis and identifying the mutation status of patients with HCC,which have successfully predicted the alteration in 7 genes in HCC patients.Our work will inspire further studies extending our classification model to the specific histological subtypes of HCC and predicting their genetic alterations.In the last chapter,we integrated digital image of HCC with radiomics to construct an efficient,non-invasive,cost-effective tool to predict immune infiltration in HCC patients and proved the correlation between Rad score and tumor-infiltrating lymphocytes,PD-1/PD-L1 expressions or tumor immune phenotypes,which has promoted the development of non-invasive biomarkers in immunotherapy.Owing to the limited number of samples included,further clinical trials will be inspired to validate our findings in large-scale prospective cohorts. |