Font Size: a A A

Distributed File System Load Balancing In Cloud Environment

Posted on:2020-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WuFull Text:PDF
GTID:2428330590995385Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of cloud computing and Internet technology,the increasing information demand and the interaction between the Internets' generate huge amounts of data.The traditional file system that uses a single server to store data cannot meet the storage of massive data,so the storage systems that require large amounts of data to be created have emerged.Distributed file system based on the server-client model design solves the limitations of stand-alone storage and collaboratively stores data among multiple servers.For the storage of massive data in the cloud environment,the distributed file system involves a large number of data server nodes and network devices.These nodes can be distributed in various places,and there are differences in the configuration.With the running of online tasks and the reading and writing of data,the imbalance of data storage occurs among various nodes and the degree of data storage balance have important implications for system performance.Therefore,how to solve the data load balancing of distributed file systems in the cloud environment is called an important research topic.In this thesis,we make a detailed analysis and research on the problem of system performance degradation caused by uneven data block storage in Hadoop Distributed File System(HDFS).This thesis mainly from two perspective of threshold and node matching scheme in data migration process,we propose a dynamic threshold adjustment strategy based on multivariate and an optimization algorithm for queue sorting.The multi-variable-based dynamic threshold adjustment strategy is based on the multi-party influencing factors of the Hadoop cluster data nodes to systematically evaluate and get the disk space usage,CPU utilization,memory utilization,disk I/O occupancy,and network bandwidth usage form a computational expression of the acquisition threshold.During the operation of the algorithm,the threshold is used for the data migration threshold to achieve adaptive dynamic load balancing and improve the balance of the server cluster data.The improved algorithm based on queue sorting is mainly to optimize and improve the randomness defects of node matching in the load balancing process,and the node storage list sorting strategy is proposed.Through sort the source node storage queue and the target node storage queue according to the data node space utilization to ensure the ordering of the node selection process and the data of the heavily loaded node is preferentially migrated to the node with lighter load.Give priority to adjustment the data of light and heavy nodes which achieve the efficiency of cluster data migration.The theoretical analysis and experimental results show that the dynamic threshold acquisition scheme achieves a better balance effect and improves the utilization of computing resources compared with static input threshold algorithm.The optimization algorithm based on queue sorting is more efficient than Balancer in the process of balancing.
Keywords/Search Tags:cloud environment, HDFS, load balancing, dynamic threshold, queue sort
PDF Full Text Request
Related items