Font Size: a A A

Pattern-Aware Cache Management For Efficient Subset Retrieving Of Astronomical Image Data

Posted on:2018-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2310330542977875Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
FTIS(Flexible Image Transport System)is the most widely used data format for archiving and interchanging of astronomical image data.With the development of astronomical technology and the increasing number of ongoing sky survey projects,the total amount of observation data store in FITS format becomes incredibly large and the size of each FITS file ranges from Megabytes(MB)to Gigabytes(GB),even to Terabyte(TB).While in most cases,astronomers only need to deal with the sub-areas of some FITS images selected from the huge amount of raw FITS files to study a specific celestial body,especially with the development of the virtual observatory,the data interchange between data centers gets more frequently,the transmission of full FITS files will be both time-consuming and I/O wasting,not to mention that by calculating the data in full FITS file only to get the information of the target sub-area could waste a lot of computing resources.Hence,providing an efficient subset retrieving method for FITS files will help astronomers get rid of the tiresome data pre-processing work and concentrate on the real astronomical research.In this thesis,a service of extracting the sub-area of FITS files efficiently based on a storage scheme with SSD as the cache layer and a pattern-aware cache management strategy named PA are presented.Specifically,the coordinate mapping method loading a sub FITS file with a fuzzy boundary instead of the accurate one to increase the chance of responding to users from SSD and the merging strategy of some similar sub FITS images will save much cache space,so that more hot sub FITS files can be loaded into cache layer to increase the cache hit ratio.Meanwhile,as the priority for cache replacement takes both the frequency and recency information of a request into account,it can drop the old cold sub-files in cache and replace them by new hot ones to ensure that the cache content can keep pace with users' requests.Finally,in the end of this file,a PA-G cache management strategy based on request pre-processing is proposed to overcome the shortage of PA strategy in over-workload situation.PA-G strategy is better than the original PA strategy by maintaining a request queue,grouping and merging sub-files in advance.The results of experiments show that,compared with traditional LRU,LFU and LRFU strategies on full FITS files and sub-files respectively,PA cache management strategy keeps a relatively high cache hit ratio of 72% and gets the shortest average response time when the ratio of cache to raw requested data size is 23%.The optimized PA strategy,namely PA-G strategy,can reduce 81.8% time-consuming than PA strategy in the over-workload situation.
Keywords/Search Tags:Pattern-aware, Cache management, Sub-set retrieving, FITS file, Request pre-processing
PDF Full Text Request
Related items