Font Size: a A A

Under Data-intensive Computing Environments Bayesian Network Learning, Reasoning And Application

Posted on:2014-09-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:F MaFull Text:PDF
GTID:1268330425976349Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the popularization of Internet and information technology progress, people’s ability to produce and collect the data is growing rapidly, and the the amount of data which need to be handled is also increasing very quickly. These data often appear as a huge amount of data, and distributed at multiple sites. The appearance of the Data-Intensive computing makes it possible to process these large data on the new situation. Data-Intensive Computing that can promote the development of advanced technology for large and fast changing data acquisition, management, analysis and understanding. At present, it has become a hot research issue in the field of data research and analysis.Bayesian Network is a product of the combination of probability theory and graph theory. It is a useful tool to help people in appling probability, statistics to the complex field, uncertainty reasoning and data analysis. However, there is a common agreement that traditional Bayesian network would take all of the data in the same site before dealing with them. So under the Data-Intensive Computing environment, it is difficult to directly apply the theory of traditional Bayesian networks and methods to the new situation. Therefore, the adjustment for the traditional Bayesian network to make its related theory, method and conclusion can be applied to Data-Intensive computing environment is very necessary.The main work and innovation of this dissertation are summarized as follows:(1) The Bayesian network learning method under Data-Intensive Computing. In Data-Intensive Computing environment, the data are usually show a large number and distributed in multiple sites. The traditional method need to be appropriately extended, so that it can be applied to the new environment. As the construction of Bayesian network is usually divided into two parts, parameter learning and structure learning, and if the the Bayesian network structure have been identified, learning its parameter is relatively easy, so this dissertation focuses on the structure learning method under the distributed environment. Considering the practical application, the data of each site data are constantly reached, so this dissertation focuses on the discussion of second cases.(2) The Bayesian inference method under Data-Intensive Computing. In the traditional Bayesian inference method, there is a common agreement that all data sets should be on the same site. In Data-Intensive Computing environment, if we use traditional Bayesian inference method respectively at each site, then deduce the result is only applicable to each site. Between different sites, as the data content is not the same, so the reasoning results may be not completely consistent, even appears to be conflicted somehow. This dissertation selects a common Bayesian inference method, which take the Gibbs sampling as the core content of randomized algorithms. It gets a final result which could fit for the whole data. The validity of this method is discussed in theory and illustrated by experiment.(3) Proposed a specific Bayesian network application under Data-Intensive Computing environment--community detection. Community detection is a hot research topic in recent years, due to various objective factors, its corresponding data set characteristics accord with Data-Intensive Computing environment, which is large amount of data and often presents a distributed characteristic. This dissertation presents a method in the process of discovering frequent sets by using association rules, to construct the corresponding network, and ultimately for community detetcion. This method has two advantages, on one hand, it can be directly applied to the Data-Intensive Computing environment, which extend the application range of the traditional Bayesian network; on the other hand, it makes full use of the association rule in the process of discovering frequent item information, and built a latent network for community detection.
Keywords/Search Tags:Data-Intensive Computing, Bayesian Network, Gibbs Sampling, Community Detecion
PDF Full Text Request
Related items