Cervical cancer is a malignant tumor that endangers women’s health,with high morbidity and mortality.In order to reduce the morbidity and mortality of cervical cancer,this paper uses data mining algorithm to study the cervical cancer data set,and provides a reference for the prevention and auxiliary diagnosis of cervical cancer.In view of the high incidence of cervical cancer,the Apriori algorithm of association rules is used to analyze,and the risk factors are mined by using association rules to help prevent cervical cancer.In view of the high mortality of cervical cancer,an ensemble learning model was constructed to classify cervical cancer data,and an efficient and accurate classification model was constructed to assist in the diagnosis of cervical cancer.In cervical cancer prevention studies,data features are first coded according to problem categories,and then continuous data are discretized based on the actual situation and data distribution.Strong association rules are obtained by setting minimum support(30%)and minimum confidence(80%).The correlation results showed that smoking,using hormonal contraceptives,having too many pregnancies,too many sexual partners and sex at too young an age for the first time had a higher risk of cervical cancer.Based on the results of association rules,women are advised to avoid smoking,have exclusive sex partners,have children of appropriate age,have HPV vaccination and have regular check-ups.In the diagnosis of cervical cancer,data were preprocessed,deletion and interpolation methods were used to deal with missing values,the data with a large proportion of missing values were deleted directly,and then EM algorithm was used to fill in the data with a small proportion of missing information.Then,random forest algorithm was used to score the importance of features.Finally,select suitable features for modeling.The integrated random forest model and the integrated support vector machine model are constructed to solve the problem of unbalanced data.The accuracy and sensitivity of integrated random forest in the diagnosis of cervical cancer reached 90% and 81.82%.The accuracy and sensitivity of the integrated SVM model in the diagnosis of cervical cancer reached 88.67%and 90.91%.The results show that the sensitivity of the integrated model has been improved to some extent. |