Font Size: a A A

A Data Placement Algorithm Based On Data Dependence

Posted on:2014-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:W DongFull Text:PDF
GTID:2268330401956364Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing, mass data is collected incloud servers, to provide users with the storage, retrieval, calculation of remoteapplication service. In the cloud servers, data operation is often the bottleneck, so thedesign and optimization of data placement algorithm, to improve memory utilizationand the speed of data access, is an important research subject of the cloud computing.Considering the explosive development of the massive data and the need for disasterprevention, mass data can not be located only on one data center, so the dataplacement algorithm should not only be used on local servers in the same data center,but also be used across the data centers. The previous problem can be considered as atactical level data placement optimization, and the later can be considered as astrategic level data placement optimization. This article mainly aims at the strategicone, and the corresponding data placement algorithm. The main research work is asfollows:(1) A data placement algorithm based on the clustering of the datadependence(DPBDD Data Placement Based On Data Dependence)was designed. Thenumerous applications of cloud servers tend to use multiple sources of data. they alsoshare some data with each other. The many to many relationship of applications anddata leads to a complex relationship between data. Existing data placement algorithmsonly consider the load balance of data center, not the correlation of data. Whenperforming a application cross data centers, traditional algorithms often produce alarge amount of data movement, and reduce the efficiency of data access. This paperfirstly defines meta application which can not be divided. The data which is used bythe same meta application is considered to be related. Then we establish the datacorrelation matrix, and then obtain clustering correlation matrix through the BEAalgorithm for data association matrix transform. In this way, data with highercorrelation in the matrix arrangement will be nearer and then the data in differentgroup is distributed to different data center through divising clustering correlationmatrix. A simulation experiment is designed to compare with the consistent hash algorithm and algorithm based on data center capacity cluster. The result shows thatthe DPBDD algorithm improves the efficience of the data movement significantly.(2) Considering the case of incremental data, we designed a DPBDD-k algorithmbased on the K-means algorithm. Mass data in cloud servers is always growing withhigh speed. The DPBDD algorithm is suitable for initial static planning and structuralupgrade planning of cloud systems. Cloud systems should layout the new data tovarious data centers reasonablely. The DPBDD-k algorithm uses the K classes of theresult from the DPBDD algorithm as K-means clustering center, calculates thedependence between the new data and the K cluster center, selects the data centerwhich has the highest dependence to store the new data. Some simulationexperiments are designed to compare with the nearby principle layout algorithms. Theresults shows that the DPBDD-k algorithm data migration has more advantages ofdata movement when adding new data.
Keywords/Search Tags:dataplacement, clustering, data dependence, K-means
PDF Full Text Request
Related items