Font Size: a A A

Computational data space model and high -performance data management

Posted on:2007-04-05Degree:Ph.DType:Dissertation
University:Northwestern UniversityCandidate:Li, JianweiFull Text:PDF
GTID:1458390005988254Subject:Computer Science
Abstract/Summary:
Vast data has been creating new challenges for large-scale data-intensive applications in terms of performance and scalability. Many applications view this increasingly large amount of data as the object to be passively processed and adopt a database model to iteratively query the results from the data. While this model works well for interactive applications, it faces expensive I/O cost as well as processing inefficiency for data-intensive tasks.;By studying a wide range of data-intensive applications, such as scientific simulations and data mining applications, we find it more efficient to view the data as the subject or the master of the computation. Instead of being queried, the data can be designed to actively drive the computation. For such a purpose, the computational properties of the data are first carefully studied. Then a computational data space is modeled, where the data is organized and directly computed from one state to another as a whole set. Not only is the data the operand set, but also it serves as the initial description, intermediate state, and result solution of the application. Based on this computational data space model, we carry out a data-driven approach to follow the data adaptively evolved from the initial state, through the intermediate computation, and towards the final results. By data-driven computation, the data tells us the results instead of us searching the data for results. We show that both the computation and I/O can be minimized in such a computation model. We also show that this model is highly scalable for data-intensive applications in parallel environments. Finally, we provide a high-performance, portable data management package that an help the application users to efficiently and conveniently manage the data in computational data spaces.
Keywords/Search Tags:Computational data space, Data management, Data-intensive applications
Related items