| Breast cancer has a high morbidity and mortality,which seriously threatens the health of women.Because its pathogenesis is difficult to determine and the condition is hidden,early breast cancer is difficult to be discovered,and it is not easy to attract patients’ attention.A large number of breast cancer patients have accelerated their deterioration due to missing the best time to treat,and even threatened their lives.The manual screening of tumors has the disadvantages of low efficiency and strong subjectivity,which makes breast cancer patients unable to be cured.In recent years,the amount of data in various fields in China has expanded exponentially,and the increasing medical data has also brought pressure and challenges to medical staff.Compared with general data,medical data has the characteristics of heterogeneity,redundancy and privacy.Medical personnel need to invest a lot of manpower and material resources in order to obtain effective information from the data of very different categories to diagnose diseases.Due to the limitations and deficiencies of traditional medical diagnosis methods,this article starts with classification supervised learning algorithms to mine relevant data of breast cancer patients and use models to make predictions.The main work of this article is as follows:(1)The traditional methods of breast cancer diagnosis and treatment are expounded,and the analysis points out the deficiencies of clinical diagnosis and imaging diagnosis.The open UCI database is selected as the data source,and the decision tree algorithm is applied to breast cancer clinical medical prediction.The digital image data is extracted by using fine needles aspiration from patients with hard breasts to predict the benign and malignant tumors.In this paper,the factor analysis method is used to select the most influential factors from the more attributes as the test attributes of the current node,and recursively build the model from top to bottom.On this basis,the optimal depth of the decision tree is explored to construct the optimal decision tree model.The experimental results show that the decision tree algorithm has a good prediction effect on the benign and malignant breast cancer,but the model is not stable,and the accuracy on the training set needs to be improved.(2)The characteristics of the decision tree model that are easy to explain and understand are very suitable for complex professional medical data mining,but it is difficult to explore the global optimal decision tree.In order to further improve the accuracy of prediction,this paper proposes an improved k-nearest neighbor algorithm,namely W-k NN algorithm.The model first normalizes the sample attributes,weights the different distances,and finally adjusts the value of k.Experiments show that the improved k NN algorithm can better predict breast cancer tumor properties and cancer recurrence.(3)The algorithm is implemented systematically,and the breast cancer tumor auxiliary prediction system is established to improve the practicality of the model.This can not only improve the efficiency and quality of data mining,but also provide effective support for the development of medical units,while focusing on individualized breast cancer treatment decisions and providing precision medical services. |