Font Size: a A A

Research On Data Organization Algorithms Of Massive Object Storage Systems

Posted on:2007-12-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H LuoFull Text:PDF
GTID:1118360242961905Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the system of network storage, optimizing the data organization is one of the effective methods used to improve the system performance. A suitable algorithm to organize data can improve the performance of the storage system and provide with reliability, availability and scalability. The data organization of the network storage system includes two aspects: the data organization in the storage space and the data organization along the route to transfer data. The former utilizes the parallelism of storage nodes to improve the I/O performance of storage system, and promotes data's reliability and availability by using data copies at different storage nodes; the latter configures the suitable cache and designs replacement algorithms in order to improve the I/O performance. This paper studies the algorithms and their related aspects of the data organization according to the characteristics of the object storage system, which consists the following:The difference between the object storage system and other network storage systems is that it makes the storage management and the user management of the file system separated. The storage management is realized at the storage nodes and the user management is implemented at the metadata server. The separating of the function makes the metadata server as a thin server, which is useful to extend the storage system. The object interface has a lot of semantics, which can provide security for storage system and QoS-based I/O services for applications. The object storage system has two modes to transfer data: the NAS mode and the third party transfer mode. Integrating the transfer mode and Cache scheme can improve the performance of the storage system.Organizing data on the storage space of object storage system is to distribute data object among storage nodes according to the storage nodes'capacities, and the relevant algorithms must minimize the penalty of time and space; at the same time, the algorithms must also be adapted to the extending of the storage system. In order to realize those goals, one algorithm is designed to distribute data objects as users access data objects in the storage system and another algorithm is designed to migrate data objects as the storage system extends. Those algorithms have the following characteristics: the storage nodes are divided into groups, and the data objects are distributed according to their group capacity among the storage nodes and are distributed uniformly in each group; the algorithm uses mapping function to map user space to storage space, which minimizes the overhead of time and space; the algorithm's adaptability is realized by recording the information about extending of the storage system; and the algorithm used for data migration minimizes the amount of migrated data.The application requires the storage system providing QoS-based I/O services, however, the data migration may affect the QoS of storage system. After analyzing the model of QoS-based I/O services, we define an extra reward of migration task and divide the migration task into some small migration requests. On this basis, a data migration model based on QoS is established and a migration algorithm is designed to maximize the migration reward. The experiment indicated that this migration algorithm has less influence on I/O QoS than other frequently used migration algorithms.Data organization at the data transfer path involves Cache replacement algorithm and the standard to evaluate Cache algorithm is whether the performance of the storage system has been improved. Some conclusions can be reached by analyzing the speedup ratio of the storage system: the factors to influence the performance of storage system include not only Cache hit ratio but also the time of data object to access storage devices. Based on these conclusions, we design two Cache replacement algorithms: LAT(Least Access Time) algorithm and WLFRU(Weighing Least Frequently Recently Used) algorithm. LAT algorithm selects the data objects with shorter time to access devices and less hit ratio as the replaced object; WLFRU algorithm weighs the access frequencies in order to take into account access localization and the penalty of data object to access devices. These two algorithms have better performance than the usual LRU algorithm.Configuring Cache along the data transfer path must be related to the characteristics of data to access storage system, however, these three entities have different characteristics, and so their Cache schemes are different. Owing to the low speed to read/write the storage nodes, we configure pre-read Buffer and write Buffer: the former is used to shorten the response time of reading and the latter to respond fast by delaying writing onto disk. The schemes of Caches at metadata server and clients are related to the transfer mode. The Cache at metadata server is used to store some smaller data objects as there are lighter loads; however, at clients, the memory Cache is used for data objects with two transfer modes and the disk Cache is only used to store those data objects using the third party transfer. All these are aimed to reduce network traffics. Experiment showed that these three Cache schemes can greatly improve the performance of the storage system.
Keywords/Search Tags:Object storage, I/O performance, Scalability, Data organization, Data migration, Cache scheme, Replacement algorithm
PDF Full Text Request
Related items