Font Size: a A A

Research On Fault-Tolerant Parallel Skyline Query Technology In Cloud Computing Environment

Posted on:2012-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2218330362960129Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an effective solution to the multi-objects optimization problems, the Skyline query is widely used in many areas such as the multi-criteria decision making and the user preference query. However, as the information that we can obtain and use rapidly increases, dealing with the Skyline query on the mass data is becoming an urgent problem. Fortunately, cloud computing provides the powerful computing ability and storage ability for the mass data processing, which can efficiently deal with the Skyline query. Nevertheless, frequent failures in datacenters pose great challenges to the Skyline query. For example, once the query is affected by the failures, the results are incorrect, and the tasks will be undone, which wastes the computing resources as well as the query experiences of users. Existing work on the parallel Skyline algorithms focus on improving the response time, the progressiveness and the load balancing etc., without considering how to handle the queries during the period of the failures. Accordingly, we study the fault-tolerant parallel Skyline query on the horizontally and the vertically partitioned datasets.According to the failures of participants in the master-slave structured parallel Skyline query algorithms on the horizontally partitioned datasets, we propose a Master-slave based Fault-Tolerant Parallel Skyline Query Algorithm on Horizontally partitioned dataset (MsFTPS-Hpd for simple). In MsFTPS-Hpd we make k duplicates for each participant. The coordinator receives query request from the user and then forwards the request to all participants. After that, the coordinator waits for the local Skylines from the participants and periodically sends the heartbeat messages to each participant. When a participant receives the request, the participant begins to compute the local skylines and preserves the intermediate results to a reliable node periodically. When the local computing is finished, the participant returns its local Skyine set to the coordinator. After the coordinator receives all local Skyline sets, it computes the skylines again and the result is the global Skyline. MsFTPS-Hpd requires the particaipants to respond to the heartneat messages. If a heartbeat message is not responded by a participant, the participant is assumed to be failed. The coordinator will choose a duplicate of this failed participant and moves its computing tasks to the duplicate. Then the failure is recovered. The theoretical analysis and the experimental results show that MsFTPS-Hpd has good fault-tolerance with low costs, and requires stable response time with increasing failed participants.According to the problem of the failures of the coordinator in the master-slave based parallel Skyline query algorithms on the horizontally partitioned datasets, we propose a Fully Distributed Fault-Tolerant Parallel Skyline Query Algortithm on Horizontally partitioned dataset (FDFTPS-Hpd for simple). In FDFTPS-Hpd there's no coordinator anymore. Each participant has the same role in the query and can communicate with each other. After the user launches a request to one of the participants (called a request launcher), the request launcher forwards the query request to all other participants. When a participant receives the query request, it begins to compute the local skylines and sorts the local skylines by their dominating ability when the computing is over. Once all the local processing is over, each participant selects its top-k skyline points and forwards them to all other participants. When the iteration is finished, each participant maintains a complete global Skyline set. FDFTPS-Hpd creates k duplicates for each participant. In the presence of failures, the failures will be quickly detected and recovered. The theoretical analysis and the experimental results show that FDFTPS-Hpd can implement the fault-tolerant parallel Skyline query with better response time and query efficiency.According to the failures of the slaves in the parallel Skyline query algorithms on the vertically partitioned datasets, we propose a Fault-Tolerant Parallel Skyline Query Algortithm on Vertically partitioned dataset (FTPS-Vpd for simple). In FTPS-Vpd we make k duplicates for each slave. The master receives query request from the user and then forwards the request to all slaves. After that, the master waits for the local Skylines from the slaves, and sends heartbeat messages to each slave periodically. When a slave receives the request, it equally partitions its local dataset to d subsets and computes the local skylines and preserves the intermediate results to a reliable node periodically. When the local computing is over, the slave returns the local Skyine set to the master. After the mater receives all local Skyline sets, it computes the skylines again and the result is the global Skyline. FTPS-Vpd requires the slaves to respond to the heartbeat messages. If a heartbeat message fails to be responded by a slave, the slave is assumed to fail. The master chooses a duplicate for this failed slave and moves its computing tasks to the duplicate. Then the failure is recovered. The theoretical analysis and the experimental results show that FTPS-Vpd has better fault tolerance as well as improved response time and the query efficiency.
Keywords/Search Tags:Cloud Computing, Data Center, Parallel Computing, Skyline Query, Fault-Tolerance Query
PDF Full Text Request
Related items