Font Size: a A A

Research On Storage Mechanism Based On Data Hotness And Coldness

Posted on:2024-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:S X WuFull Text:PDF
GTID:2568306941469904Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the ubiquitous electric power Internet of Things construction and the continuous promotion of the construction of data center,the State Grid has increasingly rich data resources,with more and more data providers and data users.However,there are still many problems in data management,which not only lead to the inadequate utilization of computing and storage resources in the middle station,but also lead to the waste of the corresponding construction cost and maintenance cost of resources.The pain point of data management in a data center system is that a large amount of cold data,that is,data with low access frequency,occupies a large amount of storage resources and thus affects the data access performance of the system.At present,hybrid storage and hot and cold data management technologies can better deal with these problems.The hot and cold data identification method is used to distinguish the hot and cold data in the system,and then the cold data and hot data are stored on different storage media to optimize the storage cost and improve the system data access performance.The main work completed in this paper is as follows.Firstly,this paper analyzes and preprocesses the hot and cold data sources to extract the access features in the data life cycle.Secondly,to solve the problem of unbalance of hot and cold data samples,focus loss function is proposed based on cross entropy to improve the weight of difficult classification samples.Then,when XGBoost algorithm is used for model training,due to the existence of hyperparameters that need to be adjusted,genetic algorithm in swarm intelligent optimization algorithm is introduced to improve the differentiation performance of hot and cold data of the whole model.The experimental results show that the accuracy of the final model is 0.9450,the AUC value is 0.9407,and the F1 score of the positive example is 0.6228.Compared with the model without improvement or with only focus loss introduced,the above indexes are improved.In addition,to meet the requirements of mixed storage management,this paper divides the hierarchical storage management system into data content management module,cold and hot data identification module and data migration module,and puts forward the migration strategy based on data cold and hot degree,and gives two overall evaluation indexes of the system:access hit rate and cost saving rate.The experimental results show that the average access hit rate of the improved XGBoost algorithm for identifying hot and cold data is 89.98%,which is 16.94%,12.38%and 11.50%higher than that of FIFO,LFU and LRU algorithms,respectively.It is worth noting that although the difference in cost saving rate between different algorithms is not large due to the small proportion of hot data itself,the effect of the improved XGBoost algorithm is also significantly better than the other three traditional algorithms.
Keywords/Search Tags:Hot and cold data, Hybrid storage, Machine learning, Genetic algorithm
PDF Full Text Request
Related items