| Pursuing a balance between storage costs and system performance has always been one of the core targets of database management system design.Hybrid storage architecture is widely used in modern databases.Each layer of storage media corresponds to different access speeds and storage costs.The data stored in the database is usually divided into two parts,the hot and the cold.How to identify the hot and cold data accurately,store the data on different storage media according to the heat degree of the data,maximize the system performance and reduce storage costs have been the hot topics of research in the field of database.At present,the identification of hot and cold data mostly depends on the specific data structures such as LRU.The relative position of the data is used to determine the data is hot or cold,which cannot fully reflect the degree of hot and cold data,and this identification method is not quantifiable and does not have durability.Therefore,this thesis hopes to develop a scientific method for identification of hot and cold data,which can improve the accuracy of recognition and quantify the degree of data heat and cold.The main contributions of this thesis are as follows:1.A Temperature Model based on Newton's law of cooling is proposed to measure the heat degree of hot data or cold data.It can quantify and visualize the heat degree of hot and cold data,and the Temperature can be used as a persistent attribute of data.2.Based on the temperature model,we propose a Temperature-based cache replacement policy named TCR(Temperature Cache Replacement).Compared with the traditional LRU algorithm,TCR has a higher cache hit rate.Furthermore,in order to overcome the shortcomings of the simple temperature model algorithm,the T-LRU(Temperature Least Current Used)cache replacement policy,which combines the temperature model with LRU algorithm,is proposed.Compared with LRU algorithm,the hit rate of T-LRU can be increased by 20%-150%,especially when the cache capacity is small.At the system level,the response time of T-LRU is lower than LRU algorithm,and the performance of system is better.3.In the scenario of e-commerce business,in order to reduce the storage cost of online and historical databases,the identification of cold and hot data and migration models based on temperature model and machine learning(GBDT)are proposed respectively.In the identification of hot and cold data,the accuracy rate based on temperature model is slightly better than that based on LRU algorithm,and has a smaller overhead.While the accuracy rate based on machine learning model can reach more than 90%,which can identify and migrate cold data as early as possible and reduce the storage cost on the premise of guaranteeing the performance of the system. |