| Breast cancer is among the most common cancer occurring in women.Its early metastasis through lymph node makes local surgical resection result in bad prognosis.Preoperative neoadjuvant therapy(NAT)is an option for patients with breast cancer,with the advantage of the reduction in tumor size and the improvement in prognosis.Residual cancer burden(RGB)is a measurement on the post-NAT tumor response and has been proved to be a significant indicator on long-term prognosis and post-surgical treatment.RCB is measured by visual inspection on the hematoxylin and eosin(H&E)stained slides of the lesion under microscope.The calculation of RCB requires 6 parameters,including the cancer cellularity within tumor bed,which is defined as the proportion of the area of malignant epithelial cells.In clinical practice,pathologists identify the tumor bed areas on the slides and estimate the local cellularity in each microscopic field within them.The cellularity of a tumor is estimated as the mean of the cellularity of all the fields.The estimation is susceptible to subjective factors and inter-rater bias.Moreover,this procedure is time-consuming and requires expertise and experience.Recently,machine learning has received much attention in digital pathology.In this thesis,we aim to develop automated methods to estimate the cellularity of histopathology slides using machine learning.We simply review some classical models such as support vector machine(SVM),convolutional neural network(CNN)and gradient boosting decision trees(GBDT)and briefly introduce the datasets used in this research,and the following methods are proposed or improved:1)A method based on nuclei segmentation and classification.Color deconvolution is used to extract the stains of the nuclei and is combined with adaptive thresholding or Laplacian of Gaussian(LoG)filters to segment them.To separate the overlapping nuclei that is common during segmentation,we propose to use an iterative thresholding algorithm.Then,the features describing the shape,intensity,texture and gradient of individual nuclei are extracted and used to train classifiers such as SVM and GBDT that can classify them as lymphocyte,benign epithelial or malignant epithelial.The cellularity can be calculated using the proportion of area of malignant cells.Our method shows good correlation to manual estimation in terms of ICC(0.83 with 95%Cl of[0.75,0.86]),compared to the original method(ICC 0.74 with 95%Cl of[0.70,0.77]).2)A method based on transfer learning.Deep learning,represented by CNN,is widely applied in data mining from images.In this method,we extract deep features from histopathology images using CNNs pretrained on a large scale dataset and select the most robust ones using a mutual information based approach and principal component analysis(PCA)to train GBDT and SVM.For cellularity estimation,we use regression and leam-to.rank models.As the outputs oflearm-to-rank models are ranking scores,they are mapped to cellularity usingk-Nearest Neighbor.Our method shows good correlation to manual estimation in terms of ICC(0.95 with 95%Cl of[0.93,0.961),compared to other published methods(ICC 0.88 with 95%Cl of[0.86,0.911).In this paper,we propose two methods for automated cellularity estimation and their good correlations to manual estimation suggest their possibility of supplanting human pathologists in clinical practice.Future research may focus on the creation of large scale histopathology image datasets and the training of more effective feature representations. |