A Type Of Data Mining Algorithm And Its Application In Intelligent Diagnosis Of Cervical Cancer

Posted on:2020-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:D Lv

Full Text:PDF

GTID:2404330599953927

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

With the development of information technology,medical data has produced a huge amount of medical data,which is not only reflected in the large number,variety and update speed,but also in the potential value of the data itself.Mining these potential information is of great significance for medical examinations,cancer treatment,and medical resource allocation.In this paper,a type of data mining algorithm is used to analyze the clinical data of cervical cancer,and the related factors such as pathogenic factors,examination methods and recommended treatment methods of cervical cancer are explored,and the corresponding classification decision model is established.This paper mainly completes the following two aspects:1.According to the UCI database Cervical Cancer(Risk Factors)data set,the characteristics of medical data in Caracas Hospital,Venezuela,the data was pre-processed.Firstly,the data contains missing values.In this paper,the direct deletion method and the constant interpolation method are combined to process the missing values.Then,because the data is unbalanced,this paper uses the upsampling method to process the unbalanced data..Finally,there are continuous attributes in the data.,this paper uses the equal-width binning method to discretize continuous attributes,and measures the discrete effects by information values.2.This paper uses a type of data mining algorithm to evaluate the risk factors of cervical cancer clinical data,which can be indirectly converted into a two-category problem.The paper mainly uses decision tree(DT),random forest(RF)and support vector machine(SVM).As the main line,the experiment was carried out in sequence.Firstly,create a decision tree classification model,calculate the diagnosis rate of the disease and the diagnosis rate of no disease,and secondly,optimize the model twice.Optimization(1):Optimized based on the minimum number of samples contained in the leaf nodes(MSSOLN-DT).Optimization(2): Pruning optimization(PO-DT)for decision trees.The decision tree is compared with the two optimized models.The results show that the MSSOLN-DT has a minimum reentry error of 0.0550 and 10-fold cross-validation error of0.1267.The optimized DT structure is simpler than the classic one..Then,this paper uses the linear kernel function as the kernel function of SVM,constructs the SVM model,andcalculates the diagnosis rate of the disease and the diagnosis rate of no disease.Finally,this paper constructs a random forest model.The paper compares and analyzes the decision tree,support vector machine and random forest creation model.Through analysis and comparison,it is found that the model constructed by random forest has a good effect in the classification and recognition of cervical cancer.When the class label is “Hinselmann”,the accuracy is up to 98.21%;when the class label is “Schiller”,although the accuracy is the lowest among the four types of labels,it also achieves the effect of 91.94%.

Keywords/Search Tags:

Data mining, Cervical Cancer, Medical data, Intelligent diagnosis, Classification

PDF Full Text Request

Related items

1	Research On Medical Intelligent Diagnosis And Decision Support Based On Imbalanced Data
2	Classification On Esophageal Cancer X-ray Image Of Xinjiang Kazak Based On Data Mining
3	Study Of Classification-Association Rules On Zhongjing Formulas' Data Mining
4	Research On Predictive Diagnosis Of Breast Cancer Based On Classification Supervised Learning Algorithm
5	Study On The Application Of Data Mining For TCM Diagnosis And Prescription
6	Classification Techniques For Imbalanced Data And Applications In Intelligent Medical Decision Support
7	Application Study Of Data Mining In Intelligent Identification Of Metabolic Syndrome In Physical Examination Population
8	Intelligent Diagnostic Method For T2DM Combined With CHD Based On Multi-modal Medical Data
9	An Analysis Of Medical Big Data And The Design And Realization Of Intelligent Surveillance System
10	Study On Medical Data Classification Based On Fuzzy Decision Tree