| Bayesian Network is a probabilistic graphical model and an important tool for the expression and reasoning of uncertain knowledge.Learning Bayesian network structure from data is an effective method of acquiring knowledge.At present,we are in an environment of massive data,how to learn Bayesian network structure from massive data is the key factor of massive data analysis and knowledge discovery.The traditional Bayesian network structure learning method is generally in the stand-alone environment and when faced with massive data,the ability of data storage and processing is not strong enough.Therefore, the research of how to learning Bayesian network structure from massive data has important significance.In recent years,with the continuous development of cloud computing technology,there appeared many tools and methods for massive data management and analysis which are suitable for cloud computing environment.Hadoop is one of them.Hadoop is a cloud computing platform suitable for processing massive data and it has excellent ability to process massive data in parallel,so we use Hadoop to learning Bayesian network structure from massive data.In this paper,we extended the traditional Bayesian network learning method onto Hadoop, designed and implemented a Bayesian network structure learning method for massive data,which use the K2-score as scoring mechanism and the Genetic algorithm as searching strategy.Genetic algorithm is an optimization search algorithm based on biological natural selection and genetic principles.It is the nature of parallel algorithm,its coding techniques and genetic operation are relatively simple and straightforward.It has a good adaptability and robustness when dealing with complex issues.lt is shown from the experiment results that the proposed Bayesian network structure learning method in this paper is correct and effective.It also provides a new thinking for learning Bayesian network structures from massive data. |