Font Size: a A A

Research On Distributed Massive Data Analysis Method And The Implementation Of Visualization System

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LinFull Text:PDF
GTID:2518306107982899Subject:Engineering
Abstract/Summary:PDF Full Text Request
The degree and level of information in modern society is getting higher and higher,the volume of data also increase greatly,the era of big data is coming.In view of the fact that the massive data have complex sources,diverse formats and are dispersed,an extensible big data platform which can cover the main business and daily data,accomplish functions such as data clearing,integration,analysis and mining is needed.This paper is based on the data integration and analysis platform project carried out by the laboratory,design and implement data integration and analysis platform which has multi-source channels,multi-structure data access capabilities,and massive data storage and analysis capabilities.The data integration and analysis platform provides data integration services for various data sources,unifies data of different formats into a consistent presentation mode,which is convenient for global management;and then conducts deep mining of data,extracts data of commercial value and carries out high-quality data analysis.The main achievement of this paper is to complete the design and implementation of the K-Means algorithm based on Mahout and the K-Means algorithm based on Spark,and to compare the performance of the two algorithms.After the data analysis,the visualization system is developed and the status monitoring of the platform server is provided.The above results are one of the main parts of the data integration and analysis platform project carried out by the laboratory.Finally,through the performance test of Huawei cloud,it achieves the expected indicators and runs well on the platform.The main research work and final results of this paper include:(1)Data analysis and massive data testingProvide off-line analysis service for high-quality data after data clearing.Specific research and development contents include: design and the implementation of K-Means algorithm based on Mahout and K-Means algorithm based on Spark.Three open datasets are used to evaluate the performance of the K-Means algorithm based on Mahout and K-Means algorithm based on Spark,and the algorithms based on the two distributed platforms are compared and analyzed.On the basis of theoretical demonstration and simulation environment test,we need to further test in the environment that actually supports PB level data,so as to obtain the test results and provide better guide and improve the actual engineering application.Huaweicloud is selected as the test environment,and K-Means algorithm based on Spark is used to form test report of data analysis from 100 G to 16 T.(2)Platform real-time monitoringZabbix,an enterprise level open source software,is used for server monitoring.Through platform monitoring,ensure the safety and reliability of platform data,and provides evaluation and real-time monitoring of platform operation.(3)Design and implementation of data visualization systemDatabases of the visualization system are design,the system technical architecture design is completed.The design and code implementation of key function which the visualization system needs to provide are described.Finally,the operation effect of the completed visualization system is shown.
Keywords/Search Tags:Big data, data analysis, K-Means algorithm, visualization system
PDF Full Text Request
Related items