Font Size: a A A

Study On Multi-Objective Scheduling Algorithms Of Hybrid Cloud

Posted on:2016-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Z LiangFull Text:PDF
GTID:1108330473954907Subject:Earth Exploration and Information Technology
Abstract/Summary:PDF Full Text Request
Cloud computing is a evolution of parallel computing, distributed computing, and grid computing. Task scheduling is one of the basic problem on cloud computing, which according to the needs of the task, allocates different tasks to the right resources nodes using the proper policies in the cloud. In the cloud computing systems, task scheduling, also known as task mapping, is the way that maps tasks to data clusters combined with a large number of heterogeneous nodes. All tasks must be the allocated to compatible computing nooes considering network bandwidth, performance of each node, computing cost, and so on. This has proved to be a non-deterministic polynomial (non-deterministic polynomial, NP).Cloud computing model has three main categories:private cloud, public cloud and hybrid cloud. Community cloud is shared by several organizations and is supported by a specific community that has shared interests and concerns (security, compliance, jurisdiction, etc.), whether managed internally or by a third-party and hosted internally or externally. The costs are spread over fewer users than a public cloud (but more than a private cloud), so only some of the cost savings potential of cloud computing are realized. Private cloud keeps fine data security, high service quality, and lower single times calculation cost. Public cloud realizes the key concept of sharing the services and infrastructure provided by an offsite, third-parry service provider in a multi-tenant environment. Hybrid cloud is a composition of two or more clouds (private, community or public) that remain unique entities but are bound together, offering the benefits of multiple deployment models. Hybrid clouds lack the flexibility, security and certainty of in-house applications. Hybrid cloud provides the flexibility of in house applications with the fault tolerance and scalability of cloud based services.In hybrid cloud, tasks scheduling faces new challenges. First of all, hybrid cloud often contains a huge number of cloud computing node, so that task scheduling is NP for a large-scale optimization problems. Secondly, the differences between lots of users in hybrid cloud, lead to variety of tasks. Different tasks are real-time bandwidth-aware, computing cost-aware, QoS-aware, or both, or above of all. Finally, users submit job potential in the form of scattered, leading to task scheduling on cloud platforms, must reflect the dynamic resources allocation problem of the of each node. Therefore, in hybrid cloud computing, task scheduling needs to solve two problems:one is the great complexity of scheduling due to the number of nodes increasing; and the second is problem dimensions of scheduling optimization increasing because of more heterogeneous nodes and users requirements for submitting a job factors are on the rise, so that calculation of. Therefore, task scheduling on hybrid cloud is a large-scale multi-objective optimization problems.According to the evolution both at home and abroad, task scheduling on hybrid cloud faces the following problems:(1) Resource management is very difficult.Due to large-scale heterogeneous resources in the cloud, resource management becomes more difficult. With the development of technologies such as networking, smart terminal, different networks such as wireless sensor networks, mobile networks and different scale clouds, have access to a public network as a part of a hybrid cloud. These huge amount of cloud resources are distinctive and heterogeneity. From the research, taking into account the task characteristics, work is concentrated in homogeneous dynamic allocation of resources or a single feature automatic clustering resource and so on. For these large heterogeneous resource organization and management, further research is needed.(2) Distributing optimization is difficult.Distributing task to large-scale heterogeneous resources become more difficult. Due to the hybrid cloud has a mass of heterogeneous resources, when using a heuristic algorithm for tasks scheduling, computational efficiency is greatly restrained. However, using online methods for scheduling, it can get fast distributing, but beyond the optimal scheduling with multi-objective. Therefore, task scheduling on a hybrid cloud, must be considered the task characteristics and influence of resource heterogeneity.(3) Multi-objective and dynamic problem increases the difficulty of scheduling.Most of current task scheduling algorithms are considered to a single object (such as QoS, the calculation of costs, etc.) problem. Therefore, these methods such as weight calculation, decision tree approach, optimize tasks scheduling combining with distributing. When it must meet multiple objects, the dimensions of these usual algorithms will be enhancing, and more difficult solving. Moreover, resource releasing of tasks is also a dynamic problem effecting task scheduling strategy.This paper studies on task scheduling in hybrid cloud. We abstracts it into a large scale multi-objective scheduling problem on heterogeneous resource. In this paper, task scheduling in hybrid could is divided into three parts:clustering heterogeneous resources according to their attributes to isomorphic resource pools, distributing tasks to resource pools according to the similarity between them, and solving multi-objective scheduling of tasks in a resource pool. These work include:(1) Studying heuristic algorithms for solving multi-objective scheduling problems in hybrid cloud.Task scheduling problem in hybrid cloud is abstracted to a multi-objective optimization model in this paper. According to this model, we study a single object differential evolution algorithm--GaDE algorithm first of all. This algorithm improve the jDE algorithm in its crossover, selection and adjustment of control parameters, so that GaDE can retain the most effective genetic information with the lest number of simulation. Moreover, a partial order relation is given, so that it can be used for the constraint.In order to better deal the multi-objective task scheduling optimization in hybrid clouds, on the basis of the GaDE, combining with the non-dominated sorting algorithm, NSGA_Ⅱ, based on Pareto optimum of quick sorting method, we present a multi-objective algorithm--NSjDE algorithm. This algorithm also makes considerations to reduce the frequency of evaluation Comparing with experiment of Min-Min algorithm, GaDE algorithm and NsjDE algorithm, results show that for the single object task scheduling, GaDE and NsjDE algorithms perform better in getting the approximate optimal solution. The optimization speed of multi-objective NSjDE algorithm is faster than the single-objective jDE algorithm, and NSjDE can produce more than one non-dominated solution meeting the requirements, in order to provide more options to the user.(2) Studying collaborative filtering based multi-objective online scheduling algorithms in hybrid cloud.Task submission in hybrid cloud has dispersion and dynamic. To solve this problem, we propose a online task scheduling algorithm based on collaborative filtering. Personalized recommendation technology is introduced to this task scheduling algorithm, as recommending resources to the tasks. The algorithm uses the similarity matrix calculation task appetite for resources to achieve the purpose of allocating tasks to the resources. In the algorithm, the marking method is based on task and resource property, while considering the requirements of multiple objects, we propose a multi-objective weighted marking method for resources similar to the way of multi-objective aggregate functions. In order to improve the accuracy of predicting, a multiple linear regression method is given in predicting model to improve collaborative filters. By experimenting with GaDE and NsjDE algorithms for tasks scheduling, results show that, although collaborative filtering based scheduling algorithm perform worse in getting shortest completion time and minimal computation cost scheduling solution compared to heuristic optimization algorithm of single/multiple object, but its computing time is much faster with strong expansibility. In addition, because of the marking model considering conflicting objectives (cost and the shortest completion time) of negative evaluation, the scheduling solutions from the collaborative filtering based multi-objective online scheduling algorithm are able to get a better balance between multiple objects. The collaborative filtering based scheduling algorithm wins in computing speed and multi-objective scheduling performance, and it is an available choice for online task scheduling in hybrid clouds.(3)Studying parallel programming model for matrix multiplication parallelization under Map-Reduce method.Founded by definition of matrix multiplication, steps in matrix multiplication calculation are independent of each other, which can be calculated in parallel. When designing a MapReduce-based convolution, in order to reduce the storage space overhead, sparse matrix storage method is used. Map stage has a large number of copies of the work leading to lower efficiency. The experiment shows that the algorithm for sparse matrix effects are better, rather than less effective for dense matrices. When the matrix size gets larger, the execution time of the algorithm increasing significantly, that is not suitable for larger matrices.MapReduce framework for distributed file cache can be stored in the cache of the computing node, making map and reduce functions can be read at running time. In this paper, we propose a matrix multiplication algorithm based on distributed caching. In this algorithm, the right matrix is stored in the node cache, while left matrix data is only read and distributed in the map stage. In reduce stage, the right matrix data is read and computed with the data received from the left matrix for the final results. Compared with MapReduce-based matrix multiplication algorithm, this algorithm complexity is greatly reduced. Because there is only left matrix data distributing in the map stage, without any copy work, network overhead is small in the shuffle stage. In reduce stage, two algorithms complexity is the same, and the time overhead is similar. Experiments show that this algorithm performs calculation efficiency improvement of MapReduce based matrix multiplication. The matrices of size has no effect on the execution time of this algorithm, indicating that the universality of the algorithm is better.(4) Studying for GA based multi-label classification algorithms for rating matrix’s dimensions in collaborative filtering based tasks scheduling algorithms.K-nearest neighbor (KNN) algorithm is especially helpful to inclusive features variable filter and it is simple, easier to implement, supporting incremental learning, capable of many complex decision space modeling of deformation. KNN is usually used in multi-label classification. KNN algorithm by calculating the distance between samples is the core principle to reach record label, while distance is calculated in reasonable distance formula according to record properties. Because of this reason, KNN relies heavily on property value. For KNN classification algorithm, Many features in a dataset, which refers to property value when calculating, are redundant or irrelevant, and this will led to deviation of distance calculation, that will effects classification of accuracy when redundancy is high.In this paper, we propose a PSO based features weighted KNN algorithm-PSOKNN. With the algorithm, the precision calculation of KNN classifier is used as the adapted function of PSO algorithm, and put the value vector of the weighted KNN of features property as a particles in PSO algorithm. Setting multiple value vector randomly will form particle swarms, then a particle with optimal adapted value, meaning optimal weight vector, will be found by changing the speed and distance of particles. Each dataset can create an optimal weight vector, and KNN classifier for each data set can achieve a very high classification accuracy that forms a high precision adaptive classifier. The experimental results show that the PSOKNN algorithm for classification can get good results on several classic test data sets.Deficiencies in this work are:(1) Both of two heuristic algorithms have not parallelization. Two heuristic algorithms with paralleling in a cloud environment, should be promoted for the computing efficiency. But because the evolutionary algorithm is computationally intensive, research on its parallel computing model is currently one of the new directions. So that this article does not work in it. In addition, the optimization model is so idealized, while the actual task scheduling process does not necessarily meet all of the information defined in the task model. This is the main factor restricting the practicability of these algorithms.(2) Marking model in collaborative filtering based scheduling algorithm could be improved. In the collaborative filtering based scheduling algorithm, we use an equal weight of each property component marking according to formula 3-5. It does not consider impacts from different scheduling object caused by the tendency of different components, so that there is still great room for improvement. In addition, the marking method of this collaborative filtering based scheduling algorithm depends on the detailed task demanding for resources. But it is hardly to know in the actual scheduling. The algorithm can be improved to use an existing task information to make recommendations.(3) MapReduce computation model based matrix multiplication algorithm still needs more improvement. With MapReduce-based convolution, dealing sparse matrix of which storage is less expensive, copying time overhead in the map stage is not large, and copying time overhead and matrix size increases exponentially instead when dealing dense matrix. So copying time can be reduced from overhead, and distributing copies to more compute nodes can enlarged copy parallel. With cache based method, the main job in the map stage is simply to distribute data, however, different compute nodes have different computing resources. Combining the scheduling algorithm can maximize efficiency of map, and avoid waiting so long for a task in the map stage, resulting in idle resources.(4) The GA based multi-label classification algorithm also has regardless of extensibility. On one hand, due to the use of heuristics method to optimize classifiers, when the data size increases, the computing time will increase rapidly. We can try to implement parallel classification base on MapReduce to reduce the computing time. The other hand, the simulated data using in experimenters is difference to the actual data, and the accuracy and completeness of the actual data may not meet the training needs of PSO algorithm section.Optimization of scheduling on hybrid cloud computing efficiency has a key role. Multi-label classification theory and method used for large-scale heterogeneous resources in hybrid cloud is discussed in this paper. We analyze the relevance of resource properties, and study a multi-label classification algorithm following establishing representation model of heterogeneous resource. We study fast distribution method for dynamic tasks basing on online recommendation theory and method. We analyze effects on distribution strategy from task needs, and establish task preference and matching model. A task property based dynamic task distributing algorithm is proposed. Using multi-objective optimization theory and method to solve multi-objective scheduling problems are studied. After analyzing the effect of task demands on resource allocation, a multi-objective optimization model for hybrid cloud task scheduling is established, and a multi-objective optimization scheduling algorithm is proposed. All of experiments show that algorithms proposed in this paper are effective to solve the task scheduling problem in hybrid cloud.
Keywords/Search Tags:Hybrid Cloud, Task Scheduling, Multi-objective Opimization
PDF Full Text Request
Related items