Font Size: a A A

Research On Hybrid Storage Layout Optimization Model With Multiple I/O Concurrency

Posted on:2020-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhaoFull Text:PDF
GTID:2428330620958177Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous growth of massive data,how to store data efficiently and safely,and deeper mining and utilization of big data has become an important issue that needs to be solved urgently.Hybrid storage is a research hotspot in the current storage field,and storage devices are constructed by using storage media with different performances and characteristics.The performance pursued by storage systems is primarily in terms of system load and processing request speed,and these performances are closely tied to data layout strategies.Therefore,in order to improve the performance of the system to study the data layout strategy,the construction of efficient and reliable hybrid storage system is the key issue that needs to be solved.The main research contents of this paper are as follows:(1)Analyze the constructed hybrid storage target,select SSD and HDD the same layer architecture as a hybrid storage structure,and use different storage characteristics of the disk to store data.(2)Analyze the existing data layout strategies,improve on the basis of SP and SOR strategies,classify data nodes based on the average load of the system,propose to sort the data in descending order by service time,store data in Greedy mode on nodes smaller than the average load until the load of the node is greater than the average load,then use Round-Robin mode stored the rest data on all nodes,and select the storage disk according to the heat of the data to avoid storing large data in the SSD to reduce the disk write life.(3)Use genetic algorithms to optimize system performance,constructing a fitness function from both system load and request response,and to analysis the slowconvergence rate and local optimal problems in genetic algorithm,constraining the initial population avoids the generation of invalid individuals,Using adaptive cross and mutation operators for genetic manipulation,improve selection algorithms to raise population quality,ensure that good individuals can inherit to the next generation,make the population more diverse.The experimental results show that when the population size ranges from 30 to 50 and the genetic algebra exceeds 600,the improved genetic algorithm performs best in finding the global optimal solution and the degree of fast convergence.In order to assess the impact of this layout strategy on system performance,experiment and analyze the experiment by setting up a Hadoop cluster environment,by changing the number of data nodes and the amount of data requested to observe the different changes of the three strategies in terms of system response time and load.The experimental results show that the improved strategy of this paper still performs well in the processing of data under the condition of large system load,compared with the SP and SOR strategies,the system performance increased by21.60% and 16.38%,respectively.Therefore,the data layout strategy proposed in this paper has a significant effect on the improvement of system performance.
Keywords/Search Tags:Hybrid storage, data layout, load balancing, genetic algorithm, Hadoop
PDF Full Text Request
Related items