Font Size: a A A

Research On Data Placement Strategy And Skyline Query In Cloud Environment

Posted on:2015-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:H S JiangFull Text:PDF
GTID:2298330467956856Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the deepening of informationization in science applying, social studies, commerceand life, efficient massive data storage and query are becoming more and more important. Inclimate science, manned spaceflight, high-energy physics, life sciences and other scientificresearch, as well as some business computing fields such as Web applications and socialnetworks, reasonable placement and real-time query of massive data become a key problem tobe solved.The speed of the development of computer hardware is limited to the nature of thematerial itself, as results in the development bottleneck for computing and storage capacityofcomputers. Distributed and parallel computing has become an important way to solving hugeamounts of data processing. Cloud computing,a new computing model,arises. Cloudcomputing has been hailed as a "revolutionary" calculation model. Cloud computing centersdistributed on the Internet have a highspeed and secure data transfer rate. With large-scaledistributed cluster as the main body, cloud computing makes storage and computing resources,distributed in different geographical positions, a virtual resources pool through virtualizationtechnology and provides the ability to store, analyse and process huge amounts of data.In cloud computing environment, the mass data needed by all kinds of applications e isstored in different data centers. For all kinds of applications, how to efficiently access andquery these data distributed in different data centers is a key problem to ensure the systemperformance. Therefore, reasonable placement strategy and the efficient query algorithm havea vital significance for reducing the number of data sets across different data centers.Skyline query is a kind of important type of query, which is widely applied inmultiple-criteria decision making, data visualization, navigation systems, geographicinformation system, etc. Along with the explosive growth of data, the cloud computingplatform to is becoming an effective way to process Skyline query for large data. Due to thehuge amounts of data, Skyline query result set is far less than the original input data, soeffectively filtering the initial input dataand reducing the data transmission across the differentdata centers not only affect the the global execution speed, but also affect the calculationcost.Facing to the two problems of data placement and Skyline query based on cloudcomputing model, this paper mainly completed the following work:(1) A two phase data placement strategy in cloud environment was proposed. Specifically,in this paper, the existing data dependencies are extend by defining the the dual dependencybetween data and applications.At the same time we consider the bandwidth of each datacenter and load balance. t. We conduct extensive experiments, and the experimental resultsdemonstrate the effectiveness of our methods.(2) A grid Skyline query processing algorithm is proposed. Specifically, first of all,algorithm based on the MapReduce is proposed, and then the optimization version of SQBDFG was further presented. The above two take advantage of the relationship betweenthe grid fast filtering by using the grid division, in order to reduce transmission overhead.Through experiments on the Hadoop environment, we verify the proposed algorithm to havean excellent performance.
Keywords/Search Tags:cloud computing, Hadoop, data placement, Skyline query
PDF Full Text Request
Related items