Font Size: a A A

The Research On Technologies Of Replica Management In Data Grid Environment

Posted on:2012-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q ZhaoFull Text:PDF
GTID:1228330344451678Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data grid is a framework, established in the context of heterogeneously distributed environment, to deal with the difficulties in such problems as massive data organization and processing. It is an efficient solution for data sharing and collaborative problem solving in the Wide Area Network (WAN), which implements for the massive data the facility like uniform access, storage, transmission, management and services. The technology of data grid greatly promotes the development of scientific research in massive data management and engineering practice. Data grid can be used as the efficient strategy to manage and share massive data produced in scientific research fields such as biomedical, astronomy, and high energy physics. The data replication technique is introduced into data grid to improve the data availability, enhance the system performance and reduce the network utilization. In this case, the problem of replication management naturally comes in. Researchers have been trying to develop efficient approaches, in accord with the characteristics of data grid, for replication management to improve the overall system performance. It has become a hotspot in data grid research.We firstly give, in this thesis, an overview of data grid related concepts, its research background and current progress, prior to presenting the research goals in this field. Then we detail the related research work on replica management models and techniques in the data grid environment. Based on the analysis and comparison of the state-of-the-art research works, we propose to investigate the key techniques in replica management in order to improve the system’s performance, scalability and adaptability. To be exact, we have developed a replica creation strategy based on grouping, a replica replacement strategies based on value, and a replica consistency maintenance strategy based on forecasting. These strategies help to improve the system performance.Our work in this thesis mainly focuses on:1. Since the data grid environment is both distributed and dynamic, we group the system nodes prior to developing the replica creation strategy based on grouping (GBRC). Then the timing of replica creation and the methods for replica placement (CBRP) are introduced. The GBRC strategy first chooses the timing for creating in the shared memory array SS a replica of a single file, according to the request frequency of this file by the nodes in SS. Then the replica file is placed at a reasonable place in SS by considering the access frequency of nodes and the network bandwidth. The GBRC strategy is validated to be correct and efficient by the simulation.2. Given the fact that users tend to have different access styles and that the file transmission may affect the network, we propose a replica replacement model and also a replica replacement strategy based on value (VRRS). The VRRS strategy decides the possibility of visiting a single file in future based on the access history of the file. The strategy assigns more weight to the file that was accessed more recently. Then, in combination with the access cost, the value of each file can be obtained and the replica file with the least value is removed until there is enough free space in the node. The experiments illustrate the VRRS strategy is superior to the other strategies.3. We firstly analyze and summarize the deficiency in existing works on the replica data consistency maintenance in the data grid environment. Then the prediction-based replica consistency maintenance strategy (PRCS) is developed. The PRCS strategy maintains the consistency between the master-replicas through active updating, where the update conflicts are avoided by the mechanism of file locking. When the master-replica is updated, the PRCS strategy maintains the consistency of the secondary-replica by the prediction-based negative updating, which aims at predicting the time of visiting the secondary-replica and the time of the next updating of the master-replica. The replica needs to be updated to satisfy the next access of the users when the users will access the secondary-replica before the next updating of the master-replica. The PRCS strategy is efficient as one can find from the simulation.
Keywords/Search Tags:Data Gird, Replica Management, Replica Creation, Replica Replacement, Consistency Maintenance
PDF Full Text Request
Related items