Font Size: a A A

Research On Key Technologies Of Big Data Analysis Process Modeling

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2428330605966981Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data in different fields,popularizing the application of big data analysis technology in various fields and providing tools and platform support for the fast implementation and efficient operation of big data analysis has become a development direction that promotes the combination of big data analysis and field applications.As the traditional data analysis tool of the core data engine,the relational database represented by SPSS Modeler cannot efficiently process the massive data.Represented by the Hadoop distributed processing platform and computing framework solves the problems of effective storage and efficient processing,but in use process involves a lot of technical details,and the analysis algorithm with specific application platform too coupling,domain-oriented reusability is poor,they cannot meet the demand of increasingly complex big data analysis.This thesis mainly studies the big data analysis process modeling and implementation methods,and proposes a domain-oriented big data analysis framework centered on domain business and multi-model collaboration.Firstly,this thesis divides the big data analysis process into a domain-oriented and platform-oriented two-layer model,and proposes domain business-driven big data analysis process modeling,allowing users to focus on the analysis logic of the problem domain itself for analysis and research.Secondly,a modular and flow-based big data analysis programming model is designed.The model of extensible data analysis module is defined,and the description method of the analysis module based on metadata and the implementation method of the analysis module based on inheritance are given,so as to realize the effective separation of commonness and variability in the analysis of big data and support the reuse of coarse-grained analysis tasks.A conversion algorithm based on analysis module and model-driven from data analysis business model(domain-oriented big data analysis model)to Oozie-based concurrent execution model(platform-oriented big data analysis model)is proposed.It ensures the flexible reuse,scalability and ease of use of big data analysis.Finally,a method based on random forest and genetic algorithm parameter automatic tuning is proposed to automatically adjust the hadoop parameter configuration to optimize the performance of the hadoop system,thereby optimizing the execution efficiency of the big data analysis process.In summary,through the business-driven big data analysis process modeling in the design field of this thesis,the big data business analysis process is stripped from the problem solving process.The big data analysis process design is carried out in a domain-oriented business manner,which is converted into platform-oriented process instance distributed parallel execution,so that users do not need to care about the implementation details of specific algorithm calculation and process execution in the process of establishing theanalysis process.In this way,industry personnel and algorithm developers can interactively build big data analysis processes and efficiently execute them,reducing the threshold for the application of big data analysis technology in all walks of life.
Keywords/Search Tags:big data analysis, model driven, model mapping, process execution optimization, Hadoop
PDF Full Text Request
Related items