Font Size: a A A

Probabilistic Graphical Models For Data-intensive Computing Construction Method And Implementation

Posted on:2014-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2268330401453948Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the representation and reasoning area of uncertain knowledge, Bayesian Network as an important probabilistic graphical model which based on the graphical network of probabilistic reasoning is the basic framework with the statistical uncertainty of knowledge representation and reasoning. At the present, from the point of view of existing knowledge and technology about uncertainty knowledge representation, the traditional Bayesian Network learning stagnates in a stand-alone environment. When learning BN from data-intensive data, there are many problems exist, such as low data storage capacity, slow data; scanning speed, slow calculated speed, inflexible data learning and so on. Therefore, this research about how to learn BN from data-intensive data has important research significance.Meanwhile, with the development of data-intensive calculation, it has achieved some important results, such as in handling massive, rapid changes, and distributed and heterogeneous data and so on.In recent years, with the continuous expansion of Cloud Computing technology applications, there is a growing concern and investigation about the data-intensive data management and analysis in cloud computing environment. At the same time, Hadoop as an important platform for cloud computing, its HBase cloud database has massive data storage capacity. Moreover, MapReduce programming model can quickly process massive data, which makes it is possible to learn BN from the massive data with using of Hadoop cloud computing platform.This research is in order to achieve massive data BN learning, it uses Hadoop as cloud platform, and traditional BN as supporting theory. In this paper, the main research jobs completed the construction method and achievement of data-intensive probabilistic graphical model, Include three parts:data preprocessing, BN structure learning and BN storage; more specifically:Firstly, Be based on MapReduce data preprocessing, we improved the way of data storage:firstly, using MapReduce to statistic on massive data; and then storing them in HBase cloud database; lastly, efficiently compressing the data storage amount. At the same time, to read the compressed massive data sample from HBase, to use the MapReduce to quickly calculate the marginal probability values which needed to build BN, and to store results in HBase to support probability table for subsequent learning of BN.Secondly, MapReduce-based on BN structure learning, we analyzed and expanded the traditional BN Hill-climbing algorithm, Minimal Description Length, and made it was applicable to the Hadoop platform, in order to enable fast and parallel BN learning.At last, HBase-based BN storage mechanism, we designed a specific BN storage structure; meanwhile, well calculated the BN corresponding Conditional Probability Table and stored in HBase; in order to offer technical support for subsequent Bayesian network inferring.
Keywords/Search Tags:Massive data, Cloud computing, Probabilistic graphical model, Bayesian networklearning, scoring&searching BN learning methods, MapReduce
PDF Full Text Request
Related items