| Industry 4.0 and the technology of Industrial Io T are promoting data digitization in many fields such as energy rapidly.Energy data comes from multiple sources such as electric,wind,solar and other renewable energy sources.The data has diverse data types and complex data structures,and its scale increases rapidly.No SQL databases appeare in the digitization process of massive data.A No SQL database has the characteristics of weak structured storage,easy expansion,scalability,and high concurrency,which is suitable for storing massive energy data.However,unreasonable storage of energy data may lead to the problems of data redundancy,waste of storage space and difficulty in data utilization.With the background,this paper adopts different distributed storage frameworks for different types of energy big data and conducts indepth research on the technologies of load balancing,caching strategies and data consistency.The main work of the paper is as follows:(1)In order to solve the problem of unbalanced loading of HBase,a global load balancing algorithm based on Region and Region Server is proposed.The Rowkey is generated by pre-partitioning the Region and running a consistent hash algorithm with virtual nodes.I adopt a greedy algorithm to perform the periodic load balancing of the Region Servers,which improves the read and write performance of HBase.The experiments show that the data writing performance is significantly improved by the new algorithm and the cluster load becomes more balanced compared with the original implementation of HBase.(2)As HBase does not have a caching algorithm for query frequency,a strategy that uses Redis as a data caching layer and a coprocessor to ensure the consistency of data writing is proposed.A cache replacement algorithm for the update and query frequency based on time smoothing is also proposed to measure the influence of the historical data on the current data to increase the cache hit radio and improve the data query performance effectively.The experimental results show that the cache hit radio and query efficiency have been effectively improved compared with the original cache algorithm.(3)In response to the lack of cache design in Fast DFS,a strategy that uses Redis as a compressed cache layer is designed.Since Redis stores data in the format of strings which cannot be directly applied to files,a file compressing and caching strategy based on Base64 and Gzip is proposed to store compressed files in Redis,which can increase the space utilization rate of Redis.According to the characteristics of energy files that they are often uploaded and queried,a cache replacement algorithm for query frequency and popularity value based on time smoothing is proposed to improve the efficiency of file query and increase the cache hit ratio.The experimental results show that the file query performance,cache hit rate,and space utilization rate have been effectively improved after applying the new strategies and algorithms.(4)A unified distributed storage middleware that integrates the above work is designed,which uses HBase and Redis for the storage of structured data and Fast DFS and Redis for the storage of file data.It also provides a unified query interface to the endusers. |