Font Size: a A A

Research And Application Of Bayesian Algorithm Based On Cloud Computing For Disease Predicting

Posted on:2017-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:H H FuFull Text:PDF
GTID:2180330485451826Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Disease diagnosis is an important work in the medical field. Kinds of medical institutions collect more and more patient samples. Prediction results from human inevitably come across error restricted to subjective factors like experience and decision-making ability, which means the accuracy and efficiency needs to be improved. Prediction theory of TCM (Traditional Chinese Medicine) argues that health is closely associated with internal and external environment. The class attribute joint probability of the bayesian classifier based on probability and statistics is difficult to estimate. Single-machine environment can’t deal with large samples within the expected time. Ideal classification models can connect features and disease as well as improve classification accuracy and extensibility. This thesis tries to solve the problems mentioned above by the following work.Firstly, this thesis proposes an improved weighted instance multinomial Naive Bayes (IWIMNB) algorithm based on cosine similarity considering local learning. IWIMNB weakens the conditional independence assumption using local training samples by constructing a high-quality classifier near validation samples. Cosine similarity measures the distance and used as the weight to train improved classification model. The contrast experiment result shows that IWIMNB has strong operability and better precision.Secondly, this thesis applies association rules to weighted average one-dependence bayes classifier, so as to consider dependency between non-parental features and contribution from various AODE models. In order to improve the speed of generating association rules, a distributed frequent itemsets mining algorithm (DFIMA) implemented with Spark based on matrix pruning is proposed, aiming at reducing useless candidate itemsets and system I/O load. The 2-candidate itemsets matrix is used to prune the process of generating (k+1)-frequent itemsets based on k-frequent itemsets. The contrast experiment result shows that DFIMA can reduce candidate itemsets during iteration, performing well both in speedup and scalability.Afterwards, Hadoop is used to implement weighted average one-dependency estimator improved by association rules (Hadoop-AR-WAODE), which includes preprocessing, training and classification. The contrast experiment result shows that Hadoop-AR-WAODE improves classification accuracy and efficiency by considering dependency between non-parental features and contribution from various AODE models.Finally, this thesis applies Hadoop-AR-WAODE to disease predicting problem in practical. Guided by the conclusion of statistical analysis on original samples, this thesis designs and implements a model for diseases classification. Medical samples and meteorological data work as input data while disease categories work as output. The final contrast experiment result shows that the classification result is not very well restricted to immature disease prediting theory. But this disease predicting model shows good efficiency and expandability, which means this model is meaningful for disease classification.
Keywords/Search Tags:disease prediction, bayes classification, cloud framework, weighted instance, frequent itemsets, matrix pruning
PDF Full Text Request
Related items