A cluster tracking algorithm for distributed data analytics

Posted on:2013-01-03

Degree:M.S

Type:Thesis

University:Rutgers The State University of New Jersey - New Brunswick

Candidate:Lasluisa, Raul S

Full Text:PDF

GTID:2458390008974575

Subject:Engineering

Abstract/Summary:

Large-scale data analytics has enabled society to model, and inspect their data to the point where useful information can be extracted, conclusions can be drawn and decision making can be enhanced. The breadth of data being analyzed today has enabled us to make proactive decision in processes we otherwise could not. At the same time the data being analyzed is both becoming larger and more distributed, making it more complex to aggregate the data to a central location and process in a timely manner in order to make decisions. This can be attributed to the scale of current distributed computational infrastructures used to solve complex problems, while generating an increasing amount of data. This data is being created not only from applications solving problems but also from the systems running the applications as well. Creating a situation where centralized data analytics benefits decline as appose to decentralized approaches.;Data analytics algorithms must therefore meet several new requirements in order to continue to process data in a timely manner. One approach to process distributed data is to use algorithms that themselves can run in a distributed manner. Using such algorithms benefit a variety of situations where there is a desire to reduce the cost of transporting and subsequently storing data. Examples can be seen in autonomic computing, where the goal is to manage large system with minimal intervention by administrators and scientific visualization where visualization techniques are performed using a secondary system.;In this work we show that combining online (and distributed) data clustering, and cluster tracking can be effectively used to detect meaningful changes in data patterns occurring in the multiple streams. In doing so, we provide an alternative to a centralized approach where data must be centralized before any analytics may be executed. Specifically, we propose an cluster tracking algorithm which takes advantage of a decentralized clustering algorithm in order to detect changes in data to then take proactive decisions. We demonstrate its accuracy and effectiveness in three different case: 1) VM provisioning 2) scheduling of Hadoop resources, and 3) object tracking in scientific applications.

Keywords/Search Tags:

Data, Tracking, Distributed, Algorithm

Related items

1	Research On Target Tracking And Resource Management Of Distributed Radar System
2	Research On Distributed Information Fusion Algorithm
3	Parallel and distributed algorithms for data association and application to multitarget tracking
4	Study On Acoustic Source Tracking Algorithm Using Distributed Microphone Arrays
5	Study On Tracking Algorithm Of Distributed Microphone Arrays
6	Research On Multisensor Multitarget Tracking Algorithm Based On PHD Filtering
7	Performance-Optimized Detection, Tracking and Modeling of Physical Phenomena in Distributed Sensing Environments
8	Algorithm Research And Application Of Target Tracking
9	Study On Deep Learning And D-S Theory Based Speaker Tracking Algorithm In Distributed Microphone Array
10	Research On Target Tracking Algorithm For Distributed Sources