Font Size: a A A

Research On Cloud Storage Of District-Sensitive Data

Posted on:2015-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhuFull Text:PDF
GTID:2298330467963160Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the expanding of data scale, cloud storage has been widely used and studied for its excellent performance and high reliability. Cloud storage technology has been becoming mature, which can integrate many low-cost hosts as a powerful computer cluster through virtualization technology.However, with the continuous sub division of the current Internet applications, cloud storage still has a lot of improved space in capacities. How to optimize cloud storage system to adopt various Internet applications and consequently improve its performance has been becoming one of focuses of cloud storage study.Recently, Internet applications based on district-sensitive data has been developed rapidly. The data in such type of Internet applications has evident region characteristic, whose users are concentrated in district. So how to optimize the performance of cloud storage system using the region characteristic is a valuable topic. This paper is about the data storage technology study of district-sensitive data. It includes the following aspects.1. Research and analysis of the current cloud storage technology and system. This paper mainly studied the HDFS infrastructure, file I/O mechanisms and some key technologies.2. Optimizing the HDFS infrastructure. Based on district-sensitivity data, this paper analysized the shortage of HDFS infrastructure and optimized it. Cache nodes are deployed in HDFS, which are near users. So, the user client can read data on cache nodes directly, and the network transmission costs will be reduced.3. Optimizing HDFS load balancing strategy. We propose a Certainty, Multi-stage and Multi-object (CMM) decision model to resolve load balancing problems. CMM decision model makes on the remaining load capacity of CPU, memory and disk as a prerequisite and takes the effect of load balancing, load migration costs and data transfer costs as the decision-making objectives. Based on a number of decision nodes designed in this paper and the impact of these decision nodes, this paper builds a directed acyclic graph. The model divides the decision-making process into some decision-making stages, through multiple stages of decision-making to determine a set of alternative load balancing schemes. And then we calculate the effectiveness of balancing schemes based on the evaluations of decision objectives and their weight. At last we select the optimal load balancing scheme according to effectiveness.4. Optimizing HDFS data placement strategy. There are two goals of optimizing HDFS data placement strategy in this paper. Firstly, when the data file is written into HDFS system, it is used to replace the default HDFS data placement policy. Based on the thought of differential probability, this strategy gives different selecting probability to Datanodes according to their remaining capacities, so as to assign workload more equitablely. Secondly, we designed a cache managerment mechanism to manage data on cache nodes. It is to put hot data to the cache nodes when cache nodes have enough capacities, and delete periodically the unpopular data from the cache nodes.5. Simulation and result analysis. To validate the cloud storage technology promoted in this paper, we developed a simulator platform based on cloudsim, and conducted the simulation. As simulation result shows, the cloud storage technology promoted in this paper is more suitable for the district-sensitive data storage. And it has obvious advantages in many aspects like I/O speed, effect of load balancing and so on.
Keywords/Search Tags:District-Sensitive Data, Cloud Storage, Load Balancing, Data Placement
PDF Full Text Request
Related items