Font Size: a A A

A variational approach towards distributed data mining

Posted on:2008-10-13Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Mukherjee, SouravFull Text:PDF
GTID:2448390005469325Subject:Computer Science
Abstract/Summary:
This thesis presents a general framework for applying the variational approximation technique to problems in Distributed Data Mining. Distributed Data Mining aims at analyzing distributed data in order to extract useful information, while paying attention to computational cost, communication cost, storage requirements, and human-computer interaction. This thesis shows that the variational method is a deterministic approximation technique that can be applied to formulate communication-efficient and scalable solutions to Distributed Data Mining problems. As an illustration, two important problems in Distributed Data Mining are chosen: Distributed Probabilistic Inferencing in Graphical Models, and Distributed Linear Regression over Vertically Partitioned Data. In both cases, analytical results have been derived to demonstrate that the variational method leads to communication-efficient and scalable solutions. These claims are validated by experimental results. From a performance point of view, the results show that sufficiently accurate results can be achieved even with modest communication-bandwidth allowances. The results also indicate that the variational techniques are highly scalable. Thus, the variational approximation framework is successfully established as a framework suitable for formulating efficient solutions to Distributed Data Mining problems.; The thesis is organized as follows: Chapter 1 provides an introduction and motivation to Distributed Data Mining, discussing its importance and the challenges it poses. The chapter also presents a brief introduction to the basic ideas of the variational approximation technique. It then reviews the existing literature relevant to Distributed Data Mining and variational methods, and finally, enumerates the contributions of this thesis. The subsequent chapters consider concrete problems in Distributed Data Mining, and apply the variational method to solve them.; Chapter 2 considers the problem of Distributed Probabilistic Inferencing in a Graphical Model. It presents an algorithm, VIDE (Variational Inferencing in Distributed Environments) that achieves considerable accuracy, while using much less communication than would be required for a complete centralization of data.; Chapter 3 considers another problem: that of Linear Regression in a Heterogeneously Distributed Environment. The variational method is applied to formulate algorithms both for learning the linear model, and for using it for predictive modeling. In this case, too, the variational algorithms are shown to be much more communication-efficient than the techniques that rely on full centralization of data.; Finally, Chapter 4 concludes this thesis.
Keywords/Search Tags:Distributed data mining, Variational, Thesis, Approximation technique, Chapter
Related items