| The alpine grassland ecosystem is the highest altitude and largest area in China.It is also one of the most important animal husbandry bases in China.It plays an irreplaceable role in ecological functions such as water conservation,biodiversity conservation,and carbon fixation in the plateau.Due to the sensitivity of the alpine ecosystem,global climate change and human activities,the alpine grassland ecosystem has begun to degrade and accelerate in recent years.The comprehensive evaluation of the degradation status of alpine grassland and the targeted treatment measures for degraded grassland are of great significance to solve the problem of degradation of alpine grassland.In recent years,a great deal of work has been done on the monitoring of alpine grassland ecosystems and on the restoration of grassland ecosystems,making monitoring data on alpine grasslands grow rapidly.The large-scale alpine grassland data includes information on meteorological observations,soil and water conservation,grassland characteristics,quadrat monitoring,resource statistics,and water quality assessment.By mining and comprehensively analyzing the massive data with low value density,not only can scientifically evaluate the degradation of alpine grassland and can provide reference for policy formulation and scientific decision-making.The basis of the evaluation of grassland degradation based on massive data is the storage and analysis of data.Therefore,it is of great significance to reliably store and efficiently analyze the massive data of alpine grassland.This paper takes the massive data storage and analysis of alpine grassland as the demand,and designs the overall analysis system of alpine grassland massive data based on Hive,and implements the system concretely.First,independent deployment of Hadoop,Hive,and Sqoop environments is accomplished through steps such as node configuration,cluster configuration,and Hadoop component configuration,and the establishment of a system infrastructure platform is achieved.Then,the data ETL and data storage are completed by using EM algorithm to perform steps such as data filling,data import,and data partition storage.After that,each function of the query and analysis function in the system are realized by function coding.After that,each function of the query and analysis function in the system are realized by function coding,and the data analysis results are post-processed to complete the implementation of the alpine grassland massive data analysis system.Finally,the system performance test was conducted,and the performance of the system was tested through an experiment to evaluate the value of the system.The main experimental results are as follows:(1)Hadoop platform data storage,read performance testing.When the number of files is 10 and the size of the file increases,the overall data size increases,and the overall system storage and reading times are always increasing.However,the average running time(the average time for processing 1 MB of data)is decreasing.This shows that as the amount of data increases,the system’s ability to process large amounts of data in parallel is reflected.(2)Data query efficiency test.Using the grassland quadrat monitoring data and some virtual data from the counties in Qinghai Province in 2014,the total data volume is approximately 39.58 million(7.56 GB),and the efficiency of data query between the Hive cluster and the relational database SQL server is compared.The results show that when the query data volume is 39.58 million,the Hive cluster data query time is 67.8% of the SQL server.Description With the increase of data volume,the efficiency of system data query is higher than that of SQL server.This paper uses distributed data warehouse technology to store and analyze the massive data of alpine grasslands,which is a significant improvement over traditional data storage and analysis techniques.The system has high efficiency in the processing of massive data and is highly exploitable.The design method and design ideas adopted are feasible and can well meet the storage and analysis requirements of massive alpine grassland data. |