Font Size: a A A

Mining Of Large Data Causality Based On Bayesian Network

Posted on:2017-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:H YaoFull Text:PDF
GTID:2278330485450738Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The global data entered a new age of ZB in 2010. According to the prediction of IDC, the global data will reach 35 ZB by 2020. Large amounts of data will affect our work, life and even nation’s economy and social development in real time. The Big Data era has come. With the rapid development of large data, probability-based statistical machine learning caused great concern from industry and academia in recent years and obtained many important successful application in the Internet, finance, natural language, biology and other fields, including that Bayesian network also got fast development over the years and became a very important machine learning method.The Bayesian network is a model describing the causal relationship graph between random variables.It is a combination of probability theory,causal reasoning and graph theory. The Bayesian network is also a unification of the traditional statisticalmethod which based on data and the artificial intelligence method which emphasizedknowledge.One of its important applications is the causal knowledge representation and reasoning of the random variables. The Bayesian network consists of two parts: structure and parameters, which is used to the qualitative and quantitative description of the causal relationship between the variables, respectively. It has the characters of versatility, effectiveness and openness. The Bayesian network can be effectively transformed data into knowledge and use this knowledge to reason when dealing withuncertain problems in the real-world. Its effectiveness has been verified in the financial risk analysis, information security, DNA analysis, softwareintelligent, medical diagnostics,system analysis and control, etc.Currently, we usually use Bayesian network to dig into the causal relationship of conventional non-sequential data and use Granger method to dig into a specific causal relationship of the conventional sequential single time series. However, there are many problems with this approach. With the advent of the era of big data, big data technology provides new ideas and methods for us to analyze and solve problems. Compared with conventional data set, data mining in large data environment will provide more comprehensive information. Founding causality from large data and general causation mining in general data will be a trend in the future.In order to improve the shortcomings of the traditional Granger models in causality of time series mining and take a step forward to improvecausality mining model, this paper propose to use the second-order Bayesian network model to do causality miningin big data environments. This model uses minimum description length(MDL) principle to mark. By futures sample data analysisand the second-order Bayesian network model training of the original time series after processing the discretization, attribute reduction, reconstruction, we can not only mine causalitybetween the nodes, but also find the relationship between causality.The main works and research results of this paper are as follows:1. The analysis and comparison of existing causalitymining models and Bayesian network structure learning method. To select Bayesian network model based on the principles of MDL scoring as a research method;2. Propose a novel Bayesian network model: second-order Bayesian network model. Design a new method of model construction, and realized correlation algorithm.3. To make simulation experiment on futures time series by using the second-order Bayesian network inference model. The experiment has not only got causal links between the internal nodes of Bayesian network in individual futures time series, but also got causal links between the edges of Bayesian network in multiple time series.
Keywords/Search Tags:Big Data, Bayesian network, Minimum Description Length, (MDL) Causality, Data Mining
PDF Full Text Request
Related items