Research And Application Of Bayesian Algorithm Based On Cloud Computing For Disease Predicting

Posted on:2017-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:H H Fu

Full Text:PDF

GTID:2180330485451826

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Disease diagnosis is an important work in the medical field. Kinds of medical institutions collect more and more patient samples. Prediction results from human inevitably come across error restricted to subjective factors like experience and decision-making ability, which means the accuracy and efficiency needs to be improved. Prediction theory of TCM (Traditional Chinese Medicine) argues that health is closely associated with internal and external environment. The class attribute joint probability of the bayesian classifier based on probability and statistics is difficult to estimate. Single-machine environment canâ€™t deal with large samples within the expected time. Ideal classification models can connect features and disease as well as improve classification accuracy and extensibility. This thesis tries to solve the problems mentioned above by the following work.Firstly, this thesis proposes an improved weighted instance multinomial Naive Bayes (IWIMNB) algorithm based on cosine similarity considering local learning. IWIMNB weakens the conditional independence assumption using local training samples by constructing a high-quality classifier near validation samples. Cosine similarity measures the distance and used as the weight to train improved classification model. The contrast experiment result shows that IWIMNB has strong operability and better precision.Secondly, this thesis applies association rules to weighted average one-dependence bayes classifier, so as to consider dependency between non-parental features and contribution from various AODE models. In order to improve the speed of generating association rules, a distributed frequent itemsets mining algorithm (DFIMA) implemented with Spark based on matrix pruning is proposed, aiming at reducing useless candidate itemsets and system I/O load. The 2-candidate itemsets matrix is used to prune the process of generating (k+1)-frequent itemsets based on k-frequent itemsets. The contrast experiment result shows that DFIMA can reduce candidate itemsets during iteration, performing well both in speedup and scalability.Afterwards, Hadoop is used to implement weighted average one-dependency estimator improved by association rules (Hadoop-AR-WAODE), which includes preprocessing, training and classification. The contrast experiment result shows that Hadoop-AR-WAODE improves classification accuracy and efficiency by considering dependency between non-parental features and contribution from various AODE models.Finally, this thesis applies Hadoop-AR-WAODE to disease predicting problem in practical. Guided by the conclusion of statistical analysis on original samples, this thesis designs and implements a model for diseases classification. Medical samples and meteorological data work as input data while disease categories work as output. The final contrast experiment result shows that the classification result is not very well restricted to immature disease prediting theory. But this disease predicting model shows good efficiency and expandability, which means this model is meaningful for disease classification.

Keywords/Search Tags:

disease prediction, bayes classification, cloud framework, weighted instance, frequent itemsets, matrix pruning

PDF Full Text Request

Related items

1	Research On Pre-seismic Anomalies Minning Algorithms
2	Research On Frequent Itemsets Mining Algorithm Of Soybean Promoter Based On Bit Combination
3	Researches About Ensemble Pruning Evaluation Measures And PS-ELMs Model With Application To Time Series Prediction
4	The Empirical Bayes Estimations Of Parameters For Multi-classification Model
5	Research Of Software Important Patterns Based On Complex Network
6	The Method Of Instance Selection In Pattern Classification
7	Research On Link Prediction Of Heterogeneous Information Networks Based On Frequent Subgraph Evolution
8	Associations Prediction Between SnoRNA And Diseases By Matrix Inference
9	The Research On Land-Cover Classification Based On Multi-Instance Learning In Haidian,Beijing
10	Prediction Of Disease Genes Based On Nonlinear Induction Matrix Completion Model