Research On Data Placement And Replication Strategy In Cloud Computing

Posted on:2016-07-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Guo

Full Text:PDF

GTID:1108330461985405

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the continuous development and popularization of information technology, the traditional pattern of information construction cannot adapt to the enterprise, especially the small and medium-sized enterprise, due to the larger IT infrastructure investment, the longer application system development cycle and the higher maintenance costs. Cloud computing is a kind of software and hardware resources which will be shared via the Internet and be provided for the use of computers and other devices on demand. Cloud computing will place the applications and data on the resource pool in the network, which is composed of lots of cheap computers and equipment, to enable users access to computing power, storage space and information services on-demand.In cloud computing environment, the relative papers usually adopt replica technology to improve system reliability, availability, and scalability. Data and replica is stored in the shared database, and users donâ€™t need to care about the specific location of data and replica, also donâ€™t need to care about the number of data replica. At the same time, a single data node can support limited data capacity. When users need more storage space and higher quality of service, it is difficult to achieve dynamic data node scale expansion if only with the method of storage hardware upgrading. The data management in cloud computing is in the shift stage from single data node to multi data nodes, and accordingly the management of the cloud data and replica is becoming more and more important to research. An excellent data placement strategy should synthetically consider storage cost, bandwidth consumption, replica method, load balance of system, to ensure the reliability and availability of data, and to improve the performance of the cloud computing system and the quality of the cloud computing services.The research of data placement and replication strategy in cloud computing is presented to support the horizontal extension and unified management of cloud data, to keep good data placement, and to ensure efficient operation of the cloud computing applications. The existing strategies of data placement and replication strategy in cloud computing cannot effectively solve the following problems:(1) Initial data placement in cloud computing environment. The initial data placement strategy is very important, for this determines the efficiency of data management in a long period after the initial placement. Therefore, how to rationally place the massive data across data nodes, reducing data transmission across data nodes, become a priority problem in cloud computing data placement policy. If the initial placement strategy is unreasonable, then the distributed transaction cost will increase, and the cloud computing power will greatly reduce. (2) To ensure the replica number. The replica strategy based on historical access frequency is a kind of dynamic replica management mechanism, which dynamically changes replica number according the access frequency fluctuation. However, this strategy does not take the distributed transaction cost into account, and also ignore the cost to create replica. In this case, the replica number is rough, lacking of fine-grained management on replica number. Through using appropriate fine-grained data replica management strategy, the read and write operations can be balanced across a large number of partitions. Therefore, the fine-grained data replica strategy need to be used to control the cost of distributed updating, and to provide the adaptability of different workloads. The more fine-grained access mode can determine the corresponding data replica number for each data items. (3) Dynamic migration of data replica. With the continued operations on the cloud data, a balanced data node may be not balanced accordingly, and the popularity of the different data replica may also change. Therefore, a cloud data management model should be set up for workload detecting, dynamical data placement and migration mechanism, to resolve the above problems and to ensure the full utility of cloud computing resources. (4) Fast location on data replica of transaction request. Existing cloud data management model usually unable to precise locate to data replica in response to a certainly transaction request. Similarly, when the application data changes which may have been beyond the data node, and then the operation may across multiple nodes, finally this will reduce the performance of the system and the user experience. So it is necessary to research a kind of fast location strategy on data replica of transaction request, in order to improve the data replica location efficiency, and to make the cloud computing platform achieve access efficiency and large throughput.The several key problems of data placement and replication strategy in cloud computing are mainly focused, and the main contributions are as follows:1. A data initial placement strategy is presented in cloud computing environment. This strategy considers the distributed transaction collaboration cost between the data replicas, and reduces the cost of distributed transactions as far as possible, especially considering the different distributed transaction cost. This strategy also considers the global data center load balance, and makes the data placement strategy can quickly converge to effective data placement solution, improving on the basis of existing greedy algorithm.2. Based on the characteristics of the data management in cloud computing, a fine-grained data replica mechanism is proposed to ensure the high user experience and overall performance of the cloud computing platform.A fine-grained data replica number management strategy is proposed, which defines in a tuple set level, can better control the distributed update cost, and improve the system throughput, and provide different adaptability of the different workload. This strategy enables the system to better handle query workload on the different read and write access mode. The determination of the data replica number makes the system suitable for a given workload, and improves the efficiency of the data replica, and significantly reduces the cost of distributed update. The strategy uses different data replica granularity, serving the different conditions of query workload of read and write combination. The experimental results show that the fine-grained data replica management can significantly reduce the average range query and greatly improve the throughput capacity of transaction in cloud computing, under the condition of different types of workloads.3. A dynamic adaptive migration strategy of data replica is presented in cloud computing environment. This strategy, based on the cloud computing storage resources dynamic scheduling mechanism, realizes the higher extensibility, and increases the fault-tolerant ability, and improves the ability of replying on the workload change. This mechanism determines to increase or decrease the replica number by monitoring the workload, and accordingly to add or delete data replica, to achieve the aim of improving the efficiency of concurrent operations. By monitoring the workload to determine major changes, the strategy does by small step, and finally the stratety can achieve the good overall partition.Through dynamic data replica migration strategy, the cloud data can switch dynamically between data nodes in the cloud computing environment, ensuring that each data node load balancing in cloud computing environment and efficient operation of the platform, to finally ensure a better user experience.4. Based on the data placement strategy and replica number strategy in cloud computing environment, a fast location mechanism on data replica of transaction request is presented, to further improve the performance of data access.For transaction request, a kind of fast location mechanism on data replica of transaction request is put forward, based on the user access requests and in a progressive elaboration as the basic unit of data access, this strategy can quickly return to the query result set. By calculating the query span, this strategy designs a standard greedy algorithm to locate the data replica. For each data partition, this strategy calculates the size of the intersection with the query subset, and then selects the largest intersection partition and deletes all the items contained in the partition of query subset, and then uses the iterative method, until query subset has no contents. By using the similar method with the minimal set covering problem, it is concluded that a set of query subset also can get the subset of the required minimum number of queries. This strategy can be able to quickly and efficiently locate on the right data replica, having better overall performance.

Keywords/Search Tags:

cloud data management, data placement, data replica, dynamic migration, replica location

PDF Full Text Request

Related items

1	Research Of Replica Location And Replica Placement For Massive Data
2	Research On Optimization Of Big Data Storage Replica Strategy In Cloud Environment
3	Research Of Replica Management In Data Grid
4	Research And Experiment About The Data Replica Placement Algorithm In Cloud Storage System
5	Research On Distributed Replica Location Model In ChinaGrid Data Management
6	Research On Replica Placement And Selection Strategies In Heterogeneous Cluster Storage System For Big Data
7	Study And Application Of Synchro Data Replica Between Gird Nodes
8	Research On Dynamic Management Of Data Replicas In Heterogeneous Hadoop Cluster
9	The Research On Technologies Of Replica Management In Data Grid Environment
10	Research On Dynamic Management Of Data Replicas In Heterogeneous Hadoop Clusters