Font Size: a A A

Research On Prostate Cancer Related Data Based On Data Mining

Posted on:2017-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:P GeFull Text:PDF
GTID:2284330503458196Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Prostate cancer is one of the most common malignant tumors in male genitourinary system. In the western European and American countries, its morbidity rate ranks first among all male cancer and its mortality rate ranks second, just after that of lung cancer. The incidence of prostate cancer in China is obviously lower than that in western countries, but in recent years it has increased significantly due to various factors. The primary tumors usually locate in the peripheral zone. With no specific symptoms in early stage, it can not be clinically diagnosed until in its late stage, resulting in poor pregnosis. That’s what makes the early diagnosis of prostate cancer an important subject for urologists. With the continuous development of medical industry informatization construction, more and more clinical data of prostate cancer have been stored in medical databases. How to find the hidden information and regularity from those data so as to make for the diagnosis and treatment of prostate cancer and finally for the whole medical research is the problem that needs to be solved urgently.This thesis makes an analysis of the clinical data of prostate cancer utilizing data mining techniques. Some clinical features and development regularity of prostate cancer have been discovered and summarized, then an early prediction model based on GA_BP neural network has been constructed.In order to simplify the index structure and to reduce the complexity of relevant problems, this thesis has dealt with the sample space by applying the attribute selection technology(nonparametric test and bivariate correlation), principal component analysis, therewith reducing the data dimensionality successfully from 9d to 4d. In order to make up for the deficiency of basic BP neural network like easiness of its falling into local minimum point and slowness of its convergence process, this study has optimized the network by appling the genetic algorithm and the LM algorithm. The experiment and simulation results showed that the improvements to some extent could bring about the desired effects. Ultimately, the Youden Index, sensitivity, specificity and accuracy in the training set were 0.7661, 0.8661, 0.9000 and 0.8903 respctively; the Youden Index, sensitivity, specificity and accuracy in the test set were 0.6853, 0.8364, 0.8489 and 0.8454 respctively. All indicators of the predication model reached a high level.After comparing the prediction model with the clinical commonly-used prostate cancer diagnostic indicators i.e. tPSA, %fPSA and PSAD, we had the corresponding areas under the ROC curve, which were 0.933, 0.798, 0.827 and 0.894 respctively. This showed that the prediction model constructed in the present study had a better ability to distinguish diseases. It could assist clinicians making diagnosis and treatment of prostate cancer and reduce unnecessary biopsies. It has important clinical application value.
Keywords/Search Tags:data mining, prostate cancer, principal component analysis, BP neural network, genetic algorithm, ROC curve
PDF Full Text Request
Related items