Font Size: a A A

Research And Implementation Of Health Data Classification Model Based On Bayesian Network

Posted on:2019-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:C LiangFull Text:PDF
GTID:2348330563953967Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of information intelligence,artificial intelligence has been widely used in various fields.Especially in the field of medical diagnosis,more and more physiological indicators and disease types have made doctors more difficult to diagnose diseases.To solve this problem,many researchers have applied machine learning to disease prediction.On the one hand,they can provide further verification of the doctor's diagnosis.On the other hand,they can provide doctors with an analysis tool for complex diseases.Bayesian network as an effective means of problem-based inference,it combines probabilistic reasoning and graph theory knowledge.Bayesian network not only can infer the posterior probability of the problem,but also delineate the dependencies between the variables clearly.This thesis uses hypothyroidism as the research object,and constructs different Bayesian network classification models according to different network structures.The main work of this thesis includes following three parts :(1)The K2 algorithm has the problem which needs to provide a priori node order when learning Bayesian network structure.Therefore,this thesis uses the traditional genetic algorithm to learn the order of the nodes.However,the traditional genetic algorithm has the problems of slow convergence rate and low convergence precision.An improved algorithm is proposed and it divide the evolution into two process: gradient genetic process and mutated genetic process.The purpose of the gradient genetic process is to obtain better populations quickly,so it uses a cross-mutation method based on competition and elimination;the purpose of the mutation genetic stage is to increase the diversity of the population as much as possible,so it uses a dynamically increasing the mutation probability to prevents the evolution from falling into a local optimum.Experiments show that the improved algorithm can learn a better network structure.(2)The hypothyroidism data has continuous attributes and missing values.Therefore,preprocessing hypothyroidism data is needed.Then construct the Na?ve Bayesian network classifier,TAN classifier,BAN classifier and MBN classifier,and the classifier is compared with other Bayesian network classifiers.Comparing the performance of these four classifiers by experiments,the final result shows that the average classification effect of BAN classifier is the best(3)By analyzing the effects of redundant attributes in hypothyroidism data,it was found that the classification effect with all attributes as network nodes is not optimal.Therefore the feature selection is introduced into the classification model.An improved algorithm is proposed to solve the problem that the ReliefF algorithm ignores the correlation between features.The improved algorithm uses the symmetry uncertainty to measure the redundancy between features and removes redundant features furtherly.Experimental results show that the classification model based on feature selection has better classification effect when the parameters are set properly.
Keywords/Search Tags:medical diagnosis, Bayesian network, structure learning, genetic algorithm, feature selection
PDF Full Text Request
Related items