| The storage of big data has become the key problem in the fast development of the Internet.A large number of data need to be stored and analyzed.The relational database is no longer able to meet the trend of rapid growth of data.In this scenario,the non relational database has been widely used.The non relational databases have showed a great advantage in dealing with massive data storage and high concurrent access.The research on the non relational data correlation has become a hot research field.As a non relational database,HBase has been used as back-end database to store data because of its advantages of extensibility and high availability.As a distributed open source storage system,an HBase database can use inexpensive computers to build a highly reliable storage cluster.HBase is a column-oriented database system.The internal file storage system uses Hadoop's HDFS.HBase is an important member of the Hadoop ecosystem.In pace with the growth in the living standard,more attention should be paid to the health indicators of the body.It is of great significance to build a comprehensive and healthy big data platform to monitor various diseases.The HBase database is used by the healthy big data platform as its back-end database.HBase load balancing is crucial to improve overall performance.The initial load balancing algorithm is analyzed,the basic idea of the policy is to ensure that the number of Regions of each Region Server is the same.However,in a real application scenario,the frequency of data access is not the same,and some data may be frequently accessed as hot data.Because the access of each Region is not equal,it may cause the load to be unbalanced and affect the response efficiency of the request.Some Regions become hot spots,causing some Region Servers to be overloaded.For load balancing in a distributed database,it is very important to take into account the influence of the hot of the data.Therefore,a load balancing algorithm is designed using a prediction method,on the basis of the number of requests for Region Server history,and the hot of the prediction data is used as a load of the Region Server.At the same time,the cluster's Cost scoring function is constructed to take into consideration the five factors of read request score,write request score,memstore size score,StoreFile size score,and locality score.During the construction of the experimental platform,a data table model was extracted based on the “basic data set of urban and rural residents' health records” approved by the Ministry of Health of the People's Republic of China.TheRow Key of the table was designed and the pre partition was used to improve the system performance.The experiment used the HBase version is hbase-1.1.12,the Hadoop version is hadoop-2.5.1.The above-mentioned optimization is verified by experiments,and experiments show that using the optimized solution can improve the performance of the HBase healthy big data platform. |