Font Size: a A A

Research On Data Organization Of Cluster Multimedia Storage System

Posted on:2008-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J G WanFull Text:PDF
GTID:1118360272966864Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the explosive growth of multimedia data on the Internet, huge amount of multimedia data have been and are continuing to be generated and shared by users around the globe. Accordingly, the demands for large scale expandable multimedia storage systems are dramatically increasing. Furthermore, multimedia date have unique demands for storage system. In a distributed network environment, a large number of clients access the servers simultaneously, putting even more rigorous performance requirement on the servers. Aiming at the traits and requirements of multimedia data, deploying an autonomous server cluster architecture, we designed a Cluster Multimedia Storage System (CMSS). More specifically, we studied the data organization algorithms including CMSS metadata management, data organization and migration, multimedia caching algorithm, etc.CMSS emplayed a TLMS(Two-level Metadata Server) algorithm. The TLMS algorithm separates the data logical view from the physical view. The logical view is managed by a Global Metadata Server (GMS). The physical view is managed by Local Metadata Servers (LMSs) on individual storage servers. By using online fail-over technology, our GMS server implements a single name space without introducing a single point of failure. With the help of the global metadata caching technique, the request handling process is much simplified, reducing the load of the GMS server and improving the system performance. By adopting the LMS technique, each storage server can autonomously manage its private storage resources as well as metadata and actual data. In addition, each storage server can provide independent storage services. Furthermore, the CMSS two-level metadata management avoids the metadata performance bottleneck exhibited in traditional centralized metadata server solutions. It also solves the consistency and synchronization overhead problems in distributed metadata management solutions.For the sake of combined high performance and scalability, through the analysis of traditional distributed and parallel data organization, we designed a AutoData algorithm, which utilizes a multi-level data management architecture and shares the advantage of distributed systems and parallel data organizations without bearing their deficiencies. The entire storage hierarchy consists of three layers: an in-memory parallel storage pool layer, an on-disk parallel storage pool layer and an on-disk distributed storage pool layer. The in-meory parallel storage pool is then made up of certain amount of memory from each and every of the storage server memory space. Similarly, the on-disk parallel storage pool is made up of a small amount of disk space from each and every of the server storage space. Finally, the rest of the disk storage space from the servers forms the on-disk distributed storage pool. The in-memory parallel storage pool is the smallest in size while achieving the best performance. The on-disk parallel storage pool is of relatively weaker performance and larger capacity. The on-disk distributed storage pool has the lowest performance with the largest capacity.According to our analysis, even though the accesses from individual clients are sequential, when multiple clients access the servers at the same time, the mixed accesses from multiple clients exhibit a random access pattern. In order to reduce the number of random accesses to the disks, we designed a CBP(Client-Based Prefetching) algorithm. By reserving certain amount of buffer for prefetching and adopting large prefetching block sizes, we reduced the number of disk accesses and improved the system performance. In terms of cache replacement algorithm, based on the high predictability of multimedia accesses, we designed a forecast-based optimal cache replacement algorithm, namely Forecast OPT (FORT). We implemented the forecast-based FORT algorithm by predicting future access addresses based on the contiguity of the multimedia accesses.We evaluated and analyzed CMSS system performance in a Gigabit Ethernet environment. We compared the single-server performances under CMSS and NFS. In general, for random reads, the CMSS server performance is slightly worse than that of the NFS server. But for sequential reads, the CMSS server performance is 20% better than NFS. In addition, we also tested the performances of CMSS, Lustre and PVFS, respectively, in a multi-server parallel storage environment. The results show that, for random reads, the CMSS server performance is better than Lustre while worse than PVFS. However, for sequential reads, the CMSS server overperforms Lustre and PVFS by 30-40%. The results reveal the effectiveness of CMSS system optimization for sequential reads.We also compared the hit rates results under the FORT and LRU replacement algorithm by simulating a multiple-client system. The results show that the hit rate of FORT and LRU are similar and relatively high when the request sizes is below 64KB. This demonstrates the effectiveness of our algorithm by adopting the 64KB prefetching block size. When the request size reaches 64KB, regardless of the number of clients, the hit rate of the FORT algorithm is 50-70% higher than LRU, further illuminating the effectiveness of FORT algorithm.
Keywords/Search Tags:Storage system, Multimedia, Cluster, Distribute, Metadata
PDF Full Text Request
Related items