Font Size: a A A

Research And Implementation Of Big Medical Data Mining Algorithms Based On Hadoop

Posted on:2015-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:N WangFull Text:PDF
GTID:2298330467462042Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Studies show that chronic diseases, such as hypertension and diabetes, along with their complications, have placed a huge burden on both individuals and society, making their prevention and treatment of great urgency. As the pathogeneses of chronic diseases are complex and variable, it is yet difficult to make accurate diagnosis in advance. However, the occurring and progressing of chronic diseases follow certain regularities, and the harm can be significantly reduced by evaluating the patient’s condition and taking pertinent intervention. Therefore, in perspective of preventive medicine, it is necessary to build predictive models with Data Mining (DM) techniques to help doctors with diagnosis and clinical supervision. However, with the booming of medical data, the existing methods and techniques seem to be no longer practicable. We may have to resort to distributed environment like Hadoop, as well as the related techniques. In brief, this thesis aims at dealing with big medical data and providing scientific basis for the prevention and treatment of chronic diseases.The research mainly focuses on the design of chronic diseases DM schemes, selection of DM algorithms, improvement, parallelization and evaluation of Decision Tree (DT), and the design and implementation of Graphic User Interface (GUI). Aiming at hypertension and Type2Diabetes Mellitus (T2DM), DM schemes and necessary input and output parameters are designed on the basis of authoritative medical guidelines. Among many DM algorithms, C4.5DT is selected, improved in stability and scale-up performance respectively, and implemented with Java language. The BCTree algorithm is combined with Bagging, while the MRC4.5algorithm is based on MapReduce. The new algorithms are then used to build models with actual medical data, and the experimental results verify the feasibility of DM schemes and the improvement of C4.5. Architecture of big medical DM system is proposed to help guide the development of WeHealth Medical Data Mining Platform, in which function interfaces are designed and implemented, the chronic disease DM algorithms are integrated, and the models are visualized.In this thesis, complex medical diagnosis, prognosis evaluation and clinical decision procedures are transformed into clear and programmable decision making processes, and feasible DM schemes are proposed. Compared to C4.5, BCTree is better in accuracy and sensitivity, while MRC4.5shows its adaptability with big medical data in size-up speed-up experiments. The WeHealth Medical Data Mining Platform has a friendly GUI and can perform well in big medical data mining tasks. After these schemes, algorithms and software are further completed, they can be used in the diagnosis and clinic supervision of chronic diseases, and are of certain significance to help prevent and treat chronic diseases.
Keywords/Search Tags:chronic diseases, Data Mining, C4.5, BaggingMapReduce
PDF Full Text Request
Related items