Font Size: a A A

Analysis of load distribution strategies for signature search and join operation in distributed computing systems

Posted on:2011-09-16Degree:Ph.DType:Dissertation
University:State University of New York at Stony BrookCandidate:Kyong, YuntaiFull Text:PDF
GTID:1448390002454050Subject:Engineering
Abstract/Summary:
Divisible Load Theory (DLT) provides an optimal criterion for scheduling computational load in distributed computing system with communication cost. Divisible load is characterized by its infinite divisibliity, where computational load can be arbitrarly partitioned and there is no dependency between partitioned load. Such a model is applicable where a large amount of data with little or no locality needs to be processed. Such data is readily found in the research community in simulating and processing scientific experiment such as particle accelerators, astronomical visual data and genome data. The arbitrary divisibility property of the load and the optimum criterion developed in the DLT literature leads to tractable algebraic solution for a given network architecture and load distribution policy. In the DLT literature, the divistible load is characterized by its computation and communication intensity, which specify the speed of computing and the cost of transferring the load. The solution for optimal load distribution exists for many variety of network architectures with scheduling policies. In the second chapter, a method of deriving the computing capacity of a computing cluster consisted of a large number of computers using DLT is examined. Instead of assuming that the whole load is available to the computing cluster, we consider a case where multiple types of loads are streamed in a stationary manner to the cluster with specific incoming rate for each load. In this work, we assume a bus network architecture where computing nodes are connected to a single dispatcher using a shared communication channel. In the third chapter, the closed form solution is derived for the expected search time of kth signature in the data set. This work is the extension of [1], where the expected time of single signature search is obtained. The work is extended to consider the search time of kth signature in the divisible load with arbitrary statistics of the location of the signature. The work also includes a method to speed up the signature search time by rearranging the load before distributing to the processor. The operation model assumed here is the linear search of signature in the load but it opens up a new avenue for further research issues when other operation models are considered. Another typical operation is the relational operation between records when a large amount of structured records is modeled as divisible load. In the forth chapter, analysis of distributed of join operation using divisible load theory is presented. The performance of Distributed Sort-merge join and Inner-Loop join operations are analyzed with DLT and the theoretical maximum number of processors that can be utilized is derived. The analysis allows both performance prediction and the development of efficient database algorithms.
Keywords/Search Tags:Load, Computing, Distributed, Signature search, DLT, Operation, Join, Data
Related items