Font Size: a A A

Research On Large-scale Tasks Processing For Big Datain Cloud Computing

Posted on:2020-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y WuFull Text:PDF
GTID:1368330575981196Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the scale expansion and type diversification of data,big data analysis and processing has become a research hotspot.In the informationized and digital society,the rapid development of Internet,Internet of Things and cloud computing technology makes data flooded in various fields and become a new natural resource.We need to use these data reasonably and efficiently to discover valuable information so as to improve the efficiency of life and work.Nowadays,with the rapid increase of data volume in various fields and the complexity of data structure,the big data analysis and processing problems are facing many challenges.Big data sets higher requirements for large-scale,real-time and effectiveness of data processing,which requires changes in traditional data analysis and processing methods and models according to the characteristics of big data,so as to achieve efficient big data collection,storage,management,analysis and processing and other important links.In recent years,the rise and outstanding development of cloud computing technology has provided important support and guarantee for big data analysis and processing.The flexibility and scalability of cloud computing are conducive to multi-user sharing,big data storage and processing.Data acquisition,data preprocessing,data analysis and mining in the general process of big data processing can be successfully realized on the basis of cloud computing.With the continuous data processing tasks from users and a wide variety of data processing tasks increasing,the large-scale task processing for big data in cloud computing has gradually become one of the important issues in big data analysis and processing.The research of large-scale task processing for big data in cloud computing includes task partitioning,allocation and scheduling at the basic level,and data analysis and processing related to tasks at the application level,etc.In order to achieve efficient processing of large-scale task requests,task partitioning,task allocation and task scheduling at the basic level play a prerequisite role in the whole process of task processing.This paper focuses on and studies the key issues of large-scale task processing for big data in cloud computing.To begin with,a large-scale tasks processing approach for big data in multi-domain is studied;in addition,a large-scale tasks processing approach for long-term benefits of system is studied;last but not least,the problems that the model in large-scale tasks processing can't recognize and process task requests in the actual application process and that the model is easy to be over-fitting are studied.The main contributions of this paper are as follows:(1)The large-scale tasks processing methods for big data are summarized,and the background,significance and key points of current research work are expounded.This paper introduces and analyses several important problems in large-scale tasks processing,and gives a detailed introduction and analysis of the solutions to the key problems in the process of task processing,including applying large-scale image processing methods and large-scale wireless data processing and other related methods to research task processing..(2)The large-scale tasks processing in multi-domain is studied.Many traditional cloud computing based single-domain task processing methods are limited to a certain extent by the type of substrate resources,price and storage location,etc.To overcome this problem,this paper proposes a multi-objective optimization based large-scale task processing approach in multi-domain(LTBM).By realizing load balancing in multi-domain,this approach effectively improves the resource utilization rate,promotes the coordination of working nodes and improves the overall performance of system.In the meanwhile,taking advantage of optimizing communication resources,the bandwidth resource cost and data delay for data transmission in multi-domain are effectively reduced.In order to achieve the multi-objective optimization of load balancing and minimizing the cost of communication bandwidth resources,a virtual network mapping algorithm in multi-domain based on multi-objective particle swarm optimization is proposed.Based on Pareto dominance theory,a fast and effective non-dominated selection method is proposed to quickly obtain the optimal virtual network mapping scheme set.We design the crowding degree comparison method to obtain the final unique solution.Cauchy mutation is used to avoid local optimum.The experimental results show that LTBM can effectively reduce the additional consumption of computing resources and bandwidth resources,and greatly reduce the task processing time.(3)A large-scale tasks processing approach for long-term benefits of system is studied.Lots of research work usually employs the conventional methods and architectures for general tasks to achieve large-scale task processing,which is limited by computing capability,data transmission and other factors.To overcome this problem,this paper proposes a large-scale tasks processing approach based on fat tree structure combined with deep neural network model and reinforcement learning,called LTDR.It realizes large-scale tasks allocation through virtual network mapping based on long-term benefits.For current physical nodes and task requests,feature extraction is used to describe the status of nodes and tasks as well as their internal relationships,so as to construct model input and reduce the original data dimension.We design and train a deep network model based on historical data and convolutional neural network for optimal node mapping decision.In the meanwhile,based on the continuous interaction and tryout with the environment,Q-learning optimizes the decision-making of virtual link mapping by evaluating feedback from the environment.We use multi-agent based reinforcement learning method to achieve this process.The whole network can be regarded as a multi-agent system.Therefore,each node can be regarded as an agent with autonomous learning ability to explore its next hop node.Markov decision process can be used to describe the whole mathematical model.We introduce the distributed value function to explore the optimal virtual link mapping decision.Large-scale tasks for big data are allocated to the optimal physical nodes for processing,which avoids overload of nodes and links and improves the overall resource utilization rate of the system.The experimental results show that LTDR can significantly improve the utilization rate of system physical resources and the long-term benefits of cloud data centers while satisfying large-scale task requests.(4)The problems that the model in large-scale tasks processing can't recognize and process task requests and that the model is easy to be over-fitting in the practical application process are studied.Because some large-scale tasks processing methods based on deep neural networks are often based on historical data,there are many tasks that can't be identified and processed by previous knowledge and experience in the actual application process.At the same time,it is easy to lead to model over-fitting in the process of using deep learning to learn complex structures.To overcome this problem,an improved large-scale tasks processing approach based on adaptive deep learning and reinforcement learning is proposed,called Tard.Firstly,we design an adaptive dropout deep computing model based virtual network mapping method to achieve large-scale task allocation.It effectively improves the model effect by model fusion.By averaging the output of multiple sub-models,it avoids the deviation of model training and achieves the goal of preventing model from over-fitting,so that the model can make correct virtual node mapping decisions.Then,aiming at the problem that some data(task requests)in the training set doesn't possess corresponding labels,based on the idea of reinforcement learning,we use the policy gradient method and back propagation algorithm to train the model.In the training stage,an exploratory method is designed to search for the optimal solution.Meanwhile,a greedy mechanism is introduced to evaluate the effectiveness of reinforcement learning agent,which makes the virtual node mapping scheme achieve the self-evolution in the direction of higher system benefits,and finally achieves the optimal virtual network mapping decision,so that large-scale tasks can be allocated to the proper task processing nodes in substrate network for big data.It achieves the efficient execution of tasks.The experimental results show that Tard can effectively avoid model over-fitting and improve the ability of task recognition and processing in the practical application on the premise of satisfying large-scale task requests.
Keywords/Search Tags:Big Data, Cloud Computing, Large-scale Tasks Processing, Task Allocation, Multi-domain, Virtual Network Mapping, Deep Learning, Reinforcement Learning
PDF Full Text Request
Related items