Font Size: a A A

Detection Method For Disease Based On Imbalance Data Classification Model

Posted on:2016-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:T T BiFull Text:PDF
GTID:2298330467987312Subject:Software engineering
Abstract/Summary:PDF Full Text Request
An imbalanced data set is a real observation data form widely existing inmany fields, such as computer science, economics, biology and medicine.Although it reflects the natural essence of objective things, in fact, people justtend to care about the characteristics of small categories. For instance, in creditcard fraud detection, although the vast majority of users are legitimate, it’sdesirable to predict the potential illegal ones through data; in corporatebankruptcy risk prediction, bankrupt companies are in the minority, whatbusiness managers really concern about is whether the current operatingconditions may lead to a potential bankruptcy; in oil exploration, oil possiblearea is not much, however, it is the focus that exploration researchers devotethemselves to find; disease detection is a typical application of imbalance datasets. In medical diagnosis, healthy people must be the majority in real dataset,however, we focus on the minority who are sick, aiming to predict the occurrenceof diseases through a few data characteristics. It has always been one of the mostchallenging difficulties in the field that to predict the unknown things andclassify them, according to the existing features of a few categories.Based on the background, an accurate classification and prediction is madeabout unknown patients in the field of disease detection according to knownmodels. Among many diseases, as breast cancer has high incidence and seriousinfluence, it has been paid extensive attention in recent years. Meanwhile, agrowing body of intense research correlatively has been conducted. In this paper,based on the model of imbalanced data, a new diagnostic process for breastcancer is put forward. First of all, overseas and domestic research status of breastComputer Aided Diagnosis model is introduced, as well as contribution thatresearchers have made all over the world and research progress. Secondly, themethod of X-ray radiograph feature extraction in the diagnosis process application is introduced, which means that image characteristics can beconverted to digital feature sets available for subsequent calculations,furthermore, on the basis of the quantitative characteristics of the data sets, roughset attribute reduction algorithm is used for the feature reduction of the breastdata sets. Finally, considered that the actual breast cancer data sets areimbalanced, under the influence of the incline of decision surface and datasubmergence, the diagnosis accuracy rate of traditional CAD model in theclassification falls sharply.In this paper, in order to solve the difficult problem above, a researchstrategy based on the model of imbalanced data sets is introduced. On a basis offull consideration of typical factors such as lack of information, datasubmergence and information loss after sampling, an imbalanced data re-sampling strategy based on cluster boundary sampling is put forward. Meanwhile,combined with the integrated learning method based on support vector machine(SVM), a breast cancer diagnosis strategy aimed at imbalanced data setclassification problem is put forward from two aspects of data and algorithm. Inthe experimental construction and analysis parts, with the X-ray radiographydatabase of university of Florida and UCI data sets, the effectiveness andstability of the proposed method is verified. Comprehensive step-shaped strategyis applied to the field of early breast automatic diagnosis. The experimental resultshows that the classification accuracy of breast cancer detection can beeffectively improved by the proposed method in the paper, which provides somepractical guidance for the diagnosis of doctors.
Keywords/Search Tags:Computer-Aided Diagnosis, Image Data Mining, Support VectorMachine, Clustering Sampling, Ensemble Learning
PDF Full Text Request
Related items