Font Size: a A A

Intelligent prefetching and caching for scientific data mining in the middleware GAMine

Posted on:2006-04-21Degree:M.ScType:Thesis
University:University of Windsor (Canada)Candidate:Hu, GangFull Text:PDF
GTID:2458390005993735Subject:Computer Science
Abstract/Summary:
Scientific data mining applications are widespread in different scientific fields. They are composed of huge datasets, complicated algorithms and often deployed on high performance parallel platforms. Especially, the increasingly large-scale data sets cause the data access to be the most time-consuming stage of the overall execution time. Caching and prefetching can be used to enhance the efficiency of data access to improve the applications' performance. Traditional OS's file system's caching and prefetching strategies as well other enhanced approaches ignore the applications' runtime situation. As a result, not all data retrieval latency can be hidden, or cache units have to be larger if data access is remote.; The first step of our approach is to build a middleware---GAMine, which is independent of data sets and applications and provides a generic data access optimization strategy for scientific data mining applications. It supports both client/server and peer-to-peer architectures, and has a flexible, symmetric design.; Secondly, within our GAMine, the prefetching strategy exploits the knowledge of access patterns and system parameters (latency and throughput) to set the preferred prefetch depth. In addition, GAMine can be told to select different caching policies according to different access patterns and architectures. As a result, the middleware can hide more latency and avoid cache pollution.; Finally, GAMine can monitor the data consumption rate and the data delivery rate to set the prefetch depth dynamically to the optimal value as regards latency hiding and the cache size. Thus even in the dynamic situation, the latency can still be hidden at anytime due to the middleware's adaptation.
Keywords/Search Tags:Data, Scientific, Prefetching, Caching, Latency, Gamine
Related items