Bayesian Network Parallel Learning And Incremental Maintenance For Data - Intensive Computing

Posted on:2015-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Fang

Full Text:PDF

GTID:2208330431969108

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Bayesian Network (BN), as one of the popular probabilistic graphical models for representing and inferring uncertain knowledge, plays an important role in data and knowledge engineering. BN learning is the premise and foundation of uncertain knowledge representation and reasoning by using BN, and learning BN from data can avoid and modify the problems effectively which is brought by expert knowledge for its subjectivity and one-sidedness. With the popularity of Web applications and the progress of information acquisition technology, people generate and collect data showing the characters of large data scale, distributed storage and dynamic change. However, the traditional methods of BN learning are not fit for the characters of massive data. Under this background, data-intensive computing, especially the MapReduce programming model, offers excellent technical support for the management and analysis of massive data, which make it possible about learning BN from massive data and maintaining the BN.For the large scale and distributed storage on massive data, based on MapReduce, this thesis proposes a BN parallel learning algorithm on data-intensive computing by extending the vital process of traditional score&search methods about BN learning after analyzing them. BN parallel learning has2phrases:parameter learning and structure learning. In parameter learning, we obtain the parameters for structure scoring through statistical analysis on massive sample data. And in structure learning, the candidate structures of each node will be scored based on the parameters. Combining the local optimal structure, we obtain the global optimal structure, i.e. we obtain BN from massive data.For the dynamic change of massive data, we extend traditional incremental maintenance approach based on MapReduce, and propose an incremental approach for maintenance of original BN based on data-intensive computing. Through map and reduce process, new data is processed in parallel and the probabilistic parameters of new data corresponding to original BN will be obtained. Based on the probabilistic parameters, the degree of inconsistency between BN and new data will be obtained. According to the degree of inconsistency, the nodes which are needed to be relearned will be relearned by the BN parallel learning algorithm, and a local structure will be obtained. Finally, the local structure and original BN will be combined to obtain the BN which has been maintained. The approaches proposed in the thesis have been tested about correctness and execution efficiency on Hadoop platform. And the result of experiment indicates that we can obtain BN from massive data through our approaches.

Keywords/Search Tags:

Data-intensive computing, Bayesian network, Parallel learning, Incrementalmaintenance, MapReduce

PDF Full Text Request

Related items

1	Under Data-intensive Computing Environments Bayesian Network Learning, Reasoning And Application
2	Probabilistic Graphical Models For Data-intensive Computing Construction Method And Implementation
3	Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments
4	Parallel Optimization Of Data Intensive Computing On Sunway TaihuLight
5	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
6	Energy efficient data-intensive computing with MapReduce
7	Research On Parallel Bayesian Network Classifier Based On C-MCMC And MapReduce
8	Research On Algorithm Of Outlier Mining In Data-intensive Computing Environments
9	Research On Wide-area Data-intensive Computing Systems For Spatial Data Processing
10	Job Scheduling Technologies In Data Intensive Supercomputing Systems