| With the rapid development of technologies such as big data,artificial intelligence,cloud computing,the market applications of technologies such as e-commerce,business intelligence and big data analysis have higher requirements for mass data storage,and is constantly adjusted with the increase of data.In today’s information age,data can be seen everywhere and is increasing day by day.Often,these data sources are complex and scattered,and each application business system is also independent of each other,thus forming an information island.In order to collect and store massive data,ensure the security of the data,and improve the management of data so that it can better provide convenience for enterprise,social,and scientific research in the era of big data,it is necessary to establish a unified data center.Based on the background of the big data integrated analysis platform project of the distributed computing laboratory of the School of Big Data and Software of Chongqing University and Chongqing MCC CCID,this paper developed a set of big data platform and deployed it on the Hadoop cluster.This platform mainly uses HDFS to store the massive data generated in the field of smart cities and smart manufacturing,and completes the integration and cleaning of the massive data through Sqoop and related ETL tools,and complete the realization of the visualization system.Finally,a simple massive data test is performed.stability.The main research work completed in this paper and the final results achieved include:(1)Through the research on the HDFS storage mechanism and HDFS high availability mechanism,the early solutions to resolve the Name Node single point failure problem in HDFS(metadata backup solution,Secondary Name Node solution,Backup Node solution and Avatar Node solution)are described and compared.The high-availability solution at the stage is analyzed in detail and an optimized solution is proposed,based on the improved high-availability solution of Hadoop 2.X,and the optimized solution is tested by the relevant master and standby node switching test.(2)Sqoop,a big data ecology component,is adopted to complete data migration,which mainly completes data migration to HDFS,Hive and HBase,and realizes integration of various data sources to facilitate global data analysis.(3)Use Kettle to customize the mass data in the enterprise to clean,remove data,dirty data and noise data irrelevant to the enterprise decision,the specific functions include incomplete data processing,repetitive data processing,data consolidation and data conversion.(4)After the data integration and cleaning,the design and implementation of the visual platform based on the HDFS distributed file system is completed.The visualization system in this paper is mainly built with the Spring Boot development framework,and the main functions include single sign-on,database connection,data set generation and other functions. |