In recent years,breast cancer has become one of the Arch-criminal that endanger women’s health.Early treatment is still the basis for saving the lives of breast cancer patients.With the development of artificial intelligence,using machine learning to establish breast cancer intelligent diagnosis system has become a research hotspot in the field of medicine.There are three difficulties in the diagnosis of breast cancer:first,the sample data is unbalanced,that is,the normal samples are much more than the diseased samples,resulting in a small number of correct diagnosis but corresponding to the virtual high accuracy,so a strong generalization network is needed,and Ada Boost algorithm is used in this paper to solve this problem;Secondly,there are a large number of medical indicators.To find the indicators related to breast cancer among these indicators can,on the one hand,improve the diagnosis speed,and more importantly,identify the essential factors affecting the disease,so as to reduce the actual physical examination costs and scientifically propose treatment plans.In this paper,traditional dimensionality reduction methods such as Principal Component Analysis(PCA)are adopted to deal with this problem.on the last hand,from the perspective of neural network theory,breast cancer diagnosis problem is a mapping problem of multiple input and single output,no matter what kind of neural network is used The output of the network is the result of various nonlinear combinations of the input.so whether the input index can deeply reflect the nature of the relationship of the disease needs to be comprehensively analyzed(data mining),supplemented and expanded the existing sample indicators.Based on the common sense that the disease and normal medical indicators should have a large difference,this paper puts forward the adjustment and standard of increasing the mean difference between the samples and the healthy samples As a basic index,the ratio of difference participates in diagnosis and classification,which is based on high-dimensional mapping.To solve the above three problems,the ZsymBoost algorithm based on the Ada Boost algorithm is developed.According to the problem of weak classifier selection by ZsymBoost algorithm and the different neural networks have different characteristics and their own adaptive ranges,this paper proposes a single nuclear network diagnosis system and a composite nuclear network diagnosis system.The main research work and achievements of this paper are reflected in the following aspects:1)ZsymBoost algorithm mainly pays full attention to sick samples on the basis of Ada Boost algorithm.Therefore,when updating the data weight,it increases the data weight of the sick samples wrongly divided into the healthy samples,and reduces the data weight of the healthy samples wrongly divided into the sick samples.(2)In a single nuclear network diagnosis system,Back Propagation neural network(BP),Radial Basis Function neural network(RBF)and Naive Bayes are selected to build BP-ZsymBoost model,RBF-ZsymBoost model and Naive Bayes-ZsymBoost model.Based on the Wisconsin breast cancer data set experiment,it is proved that the BP-ZsymBoost model with BP as the core network has the best diagnostic effect,and compared with the BP model and BP-Ada Boost,the BP-ZsymBoost model is superior to the two models in all aspects.(3)this paper presents a BRB-zsymboost model established by BP,RBF and Naive Bayes in the complex nuclear network diagnosis system,which reduces the weight of the classifier whose training data accuracy is lower than that of 0.97.The classifier experiment,in which the training data accuracy of the increased model was higher than 0.97,proved that the combined nuclear network was superior to the single nuclear network in the diagnosis of breast cancer.(4)In the process of data processing,the difference in the standard deviations of each attribute of the benign sample and the malignant sample is considered,and a set of data is added after the feature column with relatively obvious difference in standard deviations: the ratio of the square and standard deviation of the difference between the feature data of this column and the feature mean of the benign sample,so as to realize the high-dimensional expansion of the data feature.Under the BRB-ZsymBoost model,the experimental results show that this data processing method is effective and can effectively improve the diagnostic accuracy with an accuracy of 98.83%.It provides a new way for the research of data preprocessing. |