Font Size: a A A

Research And Implementation Of Big Data Anslysis System For Multiple Scenarios

Posted on:2021-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhouFull Text:PDF
GTID:2428330632462919Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The thesis comes from the construction of the big data sharing service platform of the resource center from the national science and technology condition platform in the field of medical health.The goal of the project is to gather data of human genetic resource from various provinces and cities across the country,and build a platform that can provide different functions to the researchers of the National Health Commission Research Institute for big data analysis,mining and data services.The project has some problems and needs:1)Self-built big data analysis platform.2)Flexible big data analysis capabilities in multi-dimensional scenarios.3)Analysis capabilities in distributed mining scenarios,to change the bottleneck of massive data calculation using software such as SPSS.In response to the above problems and requirements,this paper has studied related technologies of multi-dimensional analysis scenarios and data mining scenarios,and has designed and implemented a multi-scenario big data analysis system.The main contributions are as follows:(1)Aiming at the multi-dimensional analysis scenario,a hybrid query engine is designed and implemented.It does not need to manually select a query engine,and implements automatic routing of the query engine.It can simultaneously meet multiple needs such as solidification analysis and exploratory analysis.More flexible support for the needs of new data sources and analysis demand changes in the multidimensional analysis scenario.(2)Aiming at the data mining scenario,a construction method of Spark-based machine learning pipeline is proposed,which clarifies the stages of visual machine learning,and provides verification and execution methods for the processing flow.It provides a more efficient analysis method.(3)Designed and implemented a multi-scenario big data analysis system,including two analysis scenarios:multi-dimensional analysis and data mining.This system is based on open source big data components such as Kylin and Spark.At the same time,the system provides rich visualization charts and implements complex analysis through simple operations.It solves the needs of multi-scenario analysis and visual analysis of the institute.
Keywords/Search Tags:big data analysis, multidimensional analysis, visual machine learning, Spark, Kylin
PDF Full Text Request
Related items