Font Size: a A A

Programming abstraction for resource aware stream processing for scientific workflows

Posted on:2012-01-27Degree:Ph.DType:Thesis
University:Indiana UniversityCandidate:Herath, Chathura KamalanathFull Text:PDF
GTID:2468390011466566Subject:Computer Science
Abstract/Summary:
Earth-based science research has undergone significant advancement in recent years in part to the advancement in sensor technology that has made sensors cheaper and reliable and hence plentiful. As the volume of real time data grows exponentially, analyzing every new bit of data becomes computationally impractical. Yet it remains critical to identify and process events of importance. Scientific workflows are a widely accepted programming model that allows scientists to model their scientific problems concentrating on science aspects instead of computer science details. There are many scientific applications that are amenable to representation as scientific workflows yet also need to incorporate external, continuous event streams. Scientific workflows today generally work under the assumption of static or non-real time data, and do not adequately capture continuous behavior.;This dissertation presents a hybrid programming model of scientific workflow and declarative query based event processing that will enable data mining of high volumes of data streams, and facilitates setting up gateways that may consist of much needed features like triggered computing, alert systems and real time analysis. Contributions of this thesis include a programming abstraction that preserves the simplicity and user friendliness of scientific workflows while allowing event streams to be first class citizens in the programming model by defined streaming semantics. The stream semantics allow the high level computational graph to be preserved while allowing the processing between workflow system and declarative event processing system to go on as necessary. Further it proposes algorithms to identify and partition control flow sub-graphs in an event processing graph with the objective of matching run-time characteristics of the sub-graphs with the proper quality of service for execution. The thesis argues that there are different phases of the computation that require different run-time quality of service requirements ranging from high throughput event processing to computationally intensive HPC applications. Finally, the thesis shows how the proposed programming model addresses the different run-time aspects of the event processing applications.
Keywords/Search Tags:Processing, Programming, Scientific workflows
Related items