| With the deep integration of cloud computing and edge computing technologies,cloud-edge collaboration has become a typical model for solving large-scale distributed mobile applications.Data replica layout has become an important factor affecting the service quality of cloud-edge collaborative cluster,especially in the face of those data-intensive applications under cloud-edge collaboration.However,due to the inherent features,such as distributed locations,device diversity,networking heterogeneity and dynamic resources,conventional replica optimization mechanisms focusing on high availability and reliability have been difficult to meet the urgent needs of service quality and green computing at the same time.The Internet of Everything promotes the rapid expansion of data volume,and data-intensive applications have become mainstream applications in the cloud-edge collaboration environment.How to provide cost-effective replica deployment and optimization solutions for cloud-side service providers,while improving users’ Qo S satisfaction and reducing cloud-side cluster management and maintenance costs,has become a research hotspot of both academic and industrial circles.In this regard,the main contents and innovations of this paper are as follows:(1)Aiming at the problem of "high consumption and low efficiency" caused by unreasonable data replica deployment to large-scale cloud-side IT infrastructures,an energy-aware data replica deployment strategy named HEERD(High Energy Efficiency Replica Deployment Strategy)is proposed.In order to actively perceive the overall energy efficiency level of the cloud-edge cluster,a novel combined monitoring scheme is designed for performance monitoring and power consumption data acquisition.At the same time,we propose a feature reduction method based on information entropy to improve the computational efficiency of the model.Then,we construct a regression model for power consumption prediction based on deep neural network architecture.Then,the data replica deployment problem with high energy and cost perception is transformed into a bi-objective optimization problem and mathematical modeling is carried out.Next,a replica deployment strategy based on immune genetic-particle swarm optimization algorithm is proposed to find the optimal solution.The experimental results show that,compared with the PMCR and DARS replica placement algorithms,HEERD can save 5.2% and 8.5% of the average power consumption respectively when the job scale is large,and can also effectively reduce the network and storage overhead.(2)Aiming at the problems of extensiveness,blindness and time lag in the process of creating and updating data replicas in existing data replica placement strategies,we propose a replica cooperative prefetch mechanism called CRP(Correlation-aware Replica Prefetching Mechanism),avoiding the impact of "data silos".An overall architecture of replica prefetching based on association awareness is designed,and a prefetching module is innovatively added to each edge cloud to provide support for the data collaborative prefetching mechanism.According to the file access characteristics,files are divided into explicit high-value files and implicit high-value files.The overall process of replica prefetching based on association awareness is designed.An access rule management method based on consistent hash is proposed,and a set of access rule storage and query mechanism is designed based on the adjacency list storage structure.The experimental results show that,compared with the benchmark algorithms,CRP can reduce the average response time by 4.8%-14.5% and achieve an average prefetching accuracy of 63.9% for I/O-intensive jobs,and can effectively utilize network bandwidth and control the frequency of replica creation.(3)In the face of highly dynamic user requests,aiming at the problems of "data skew","replica flooding" and "resource overflow" caused by unreasonable data replica placement,this paper proposes a two-stage data replica management mechanism based on "Recommendation&Learning" named TRM(Two-stage Replica Management Mechanism).By giving a typical application scenario for mobile users in a MEC environment,a two-stage replica management framework is designed;then the target problem is formally described and mathematically modeled.On this basis,a TRM strategy is proposed.In replica recommendation stage,a replica recommendation engine based on motion prediction and feedback optimization is proposed.In replica placement stage,a replica placement rule learning model based on A3 C reinforcement learning is proposed.The experimental results show that compared with the existing replica placement methods,TRM can reduce the waiting delay by 1.28%-5.55% and save the cost by 2.68%-9.4%.(4)In order to meet the deadline and execution cost requirements of data-intensive workflows in the cloud-edge environment,a replica layout strategy for data-intensive workflows is proposed to improve the efficiency of workflow execution.First,we describe and define the objects related to data-intensive workflows.Second,by designing and analyzing a typical data-intensive workflow instance,a mathematical model of the multi-constraint workflow replica layout problem is constructed.Then,a data replica placement strategy based on the binary artificial bee colony algorithm is proposed,and the time complexity of the algorithm is analyzed in detail.The experimental results show that,compared with the RPS and GA algorithms,the proposed strategy can save 26.3% and 12.2% of the economic cost respectively and maintain a lower workflow default rate.Overall,a series of replica optimization mechanisms for data-intensive applications under cloud-edge collaboration are explored in this dissertation,with high Qo S and low overhead.The evaluation results verify the rationality and efficiency of the proposed algorithms in managing replicas.Meanwhile,our studies provide an effective solution for challenges in the real-life cloud-edge collaboration systems,which have important theoretical significance and application value. |