Font Size: a A A

Bayesian Network Parallel Learning And Incremental Maintenance For Data - Intensive Computing

Posted on:2015-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y FangFull Text:PDF
GTID:2208330431969108Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bayesian Network (BN), as one of the popular probabilistic graphical models for representing and inferring uncertain knowledge, plays an important role in data and knowledge engineering. BN learning is the premise and foundation of uncertain knowledge representation and reasoning by using BN, and learning BN from data can avoid and modify the problems effectively which is brought by expert knowledge for its subjectivity and one-sidedness. With the popularity of Web applications and the progress of information acquisition technology, people generate and collect data showing the characters of large data scale, distributed storage and dynamic change. However, the traditional methods of BN learning are not fit for the characters of massive data. Under this background, data-intensive computing, especially the MapReduce programming model, offers excellent technical support for the management and analysis of massive data, which make it possible about learning BN from massive data and maintaining the BN.For the large scale and distributed storage on massive data, based on MapReduce, this thesis proposes a BN parallel learning algorithm on data-intensive computing by extending the vital process of traditional score&search methods about BN learning after analyzing them. BN parallel learning has2phrases:parameter learning and structure learning. In parameter learning, we obtain the parameters for structure scoring through statistical analysis on massive sample data. And in structure learning, the candidate structures of each node will be scored based on the parameters. Combining the local optimal structure, we obtain the global optimal structure, i.e. we obtain BN from massive data.For the dynamic change of massive data, we extend traditional incremental maintenance approach based on MapReduce, and propose an incremental approach for maintenance of original BN based on data-intensive computing. Through map and reduce process, new data is processed in parallel and the probabilistic parameters of new data corresponding to original BN will be obtained. Based on the probabilistic parameters, the degree of inconsistency between BN and new data will be obtained. According to the degree of inconsistency, the nodes which are needed to be relearned will be relearned by the BN parallel learning algorithm, and a local structure will be obtained. Finally, the local structure and original BN will be combined to obtain the BN which has been maintained. The approaches proposed in the thesis have been tested about correctness and execution efficiency on Hadoop platform. And the result of experiment indicates that we can obtain BN from massive data through our approaches.
Keywords/Search Tags:Data-intensive computing, Bayesian network, Parallel learning, Incrementalmaintenance, MapReduce
PDF Full Text Request
Related items