Font Size: a A A

Research On Data Migration Mechanism Of Greenplum Database Based On Software Defined Network

Posted on:2019-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y XieFull Text:PDF
GTID:2428330596966428Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The widespread application of Internet technology and Internet plus has caused the explosion of user data,and the large-scale data has further promoted the evolution of distributed databases.Among the studies of distributed databases,data migration is an important technology for distributed databases.It can ensure load balancing and maintain the services of high performance.Greenplum is a popular database technology nowadays.It leverages a massively parallel processing architecture,and it has the characteristics of high concurrency,high performance and high cost performance ratio.Therefore,the Greenplum database is selected as the research object,and the data migration mechanism under the environment of database cluster is studied in this thesis.There are two kinds of migration schemes for Greenplum database,one is the migration of compute nodes and the other is the increase of compute nodes.The former has a low cost of migration,but the database cluster needs to be shut down before data migration,causing users no access for database during the migration period.The latter has a high cost of migration.The redistribution of database tables is triggered during the data migration process,it consumes a large amount of database resources and affects the response time of user requests.These two schemes are not friendly to user requests.Therefore,a better Greenplum database migration mechanism is searched in this thesis,and the specific work is as follows:(1)The migration scheme of increase computing node is experimented,and a data migration model considered the user requests and migration operation is designed based on the experimental results.The proposed model is divided into two parts: the load balancing model and the migration cost model.The load balancing model describes the load distribution of the child nodes of the database cluster through the Shannon information entropy formula.It is determined whether the distribution of the database nodes is uniform or not according to the size of the information entropy.The load balancing model is the key factor that triggers the migration operation.And the migration cost model includes the cost of the user request and the cost of migrating data,where the user request cost is the cost of executing the SQL statements by Greenplum database.(2)The migration operations of all database nodes are transformed into a combinatorial optimization problem for solving the optimal migration sequence,and an improved grey wolf optimizer algorithm is proposed to solve the problem.The improvement of the grey wolf optimizer algorithm is showed in two aspects: one is the parameter optimization and the other is the fusion of evolution operator.The parameter optimization is achieved by: the addition of weight factors and niche radius,and a linearly decreasing parameter is set to nonlinear decreasing.And the fusion of evolution operator is achieved by the mutation,reorganization,crossover,and selection of grey wolf individual in the iterative process.While increasing the diversity of the population,it eliminates individuals with low fitness through the niche radius,keeping the number of population consistent and making it closer to the optimal solution.(3)In order to allocate network bandwidth to user requests and migration operations dynamically,the idea of Software-Defined Networking is integrated into the migration mechanism of Greenplum database.The SDN controller's control logic strategy is programmed on demand,and the network bandwidth is dynamically allocated according to the weight of the user request cost and migration cost.Contrast experiments are carried out to verify the above research work.Comparing to the grey wolf optimizer algorithm and the particle swarm optimization algorithm,the superiority and effectiveness of the improved grey wolf optimizer algorithm is proved by the number of iteration and the execution time.Comparing to the migration scheme of increasing the computing node of Greenplum database,the feasibility and effectiveness of the Greenplum database migration mechanism based on SDN is verified by the migration time and the response time of user request.
Keywords/Search Tags:Greenplum, Load Balance Model, Migration Cost Model, Grey Wolf Optimizer, Software-Defined Network
PDF Full Text Request
Related items