| Breast cancer is one of the most common malignant tumors in modern women.With the advancement of medical treatment,the survival rate of breast cancer has been greatly improved,but its incidence has been increasing since the late 1970 s.Breast cancer seriously threatens women’s physical and mental health.In China,breast cancer ranks first among the malignant tumors in women,with the characteristics of early onset age and late treatment period.Early diagnosis of breast cancer is very helpful for later treatment,but the current diagnosis rate is not ideal.With the increase of various new cancer cases worldwide,the heterogeneity of cancer patients has become more and more obvious,so the research and treatment of cancer is still an important challenge facing humanity.In order to solve the heterogeneous type of breast cancer,the main challenge is to retyping it.This article mainly conducts research from two aspects,using machine learning technology to predict the benign and malignant breast cancer and to retype Luminal B breast cancer patients.The first aspect is the prediction of benign and malignant breast cancer.Existing methods mainly focus on the accuracy of prediction,but for the prediction of disease,it is far more harmful to misjudge a patient than to misjudge a healthy person.For the prediction of benign and malignant breast cancer,the data of digitized images of a fine needle aspirate of a breast mass is used.Breast cancer is divided into benign and malignant by constructing prediction models.Firstly,exploratory data analysis is carried out,and preprocess the data based on the analysis results.Then,multiple algorithms are used for modeling and feature analysis.In the process of prediction,prediction algorithms such as Ada Boost,Support Vector Machine,Random Forest and Neural Network are used.The second aspect is the retyping of breast cancer.The classification of breast cancer is more detailed,the clinical characteristics of subtypes are analyzed,and the clinical significance of subtypes is finally obtained,which is convenient for subsequent more effective treatment.Although many researchers are devoted to the reclassification of breast cancer and found possible targets based on the classification,the clinical effect is not ideal because the results are not related to the prognosis.This thesis uses breast cancer data from the Tumor Genome Project(TCGA)to retype Luminal B breast cancer.For the retyping of breast cancer,a clustering method based on unsupervised learning was studied on the TCGA breast cancer data set.Theinnovations of this thesis mainly have the following two points: 1)This thesis proposes a clustering algorithm based on weight and density.It calculates the contribution rate of each feature to the intra-cluster distance and the inter-cluster distance,assigns a weight to each feature,and uses the weight to measure the degree of influence of each feature on its cluster.2)In the genetic analysis,the survival analysis is carried out throughout the whole process,so as to ensure that the final gene is related to the prognosis,and the entire analysis process is closely combined with the prognosis.Finally,Luminal B breast cancer of breast cancer was retyped,and genetic analysis and pathological analysis were performed on the subtypes. |