Font Size: a A A

Research And Implementation Of Big Data Integrated Development Platform

Posted on:2022-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhuFull Text:PDF
GTID:2518306338470334Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the mobile Internet,the amount of data on the Internet has exploded.There is huge value in the massive data,and how to mine the value contained in these data better and faster has gradually become the focus of general attention of data owners.Nowadays,big data processing technology is booming,and many excellent big data computing frameworks have been launched,providing many reliable solutions for the calculation and processing of massive data.It is no longer difficult to extract value from massive data.However,the traditional data development model has affected the efficiency of data development and value extraction.After preliminary technical research and analysis,it was found that the following problems exist in the traditional big data development model:1)The data development process is relatively cumbersome,there is a lack of a unified big data development environment.During the big data development process,developers need to interact with the big data cluster through the command line,lead to inefficient development.2)The synchronization method of multi-source data is not uniform,and Stand-alone data synchronization is easy to reach performance bottlenecks,which makes data interchange difficult.A unified distributed data synchronization scheme is urgently needed;3)There is too much manual intervention in the big data processing process,and the automation of the big data processing process cannot be realized,which seriously affects the production efficiency and production quality of the data.Regarding the issue above,this thesis focuses on the research and analysis of big data processing process automation and distributed data synchronization technology,and completed the research and implementation of a multi-scenario-oriented big data integrated development platform.The main research contents are as follows:1)Proposed and implemented a distributed data synchronization scheme based on DataX:Based on the research and improvement of the open source data synchronization tool DataX,a distributed data synchronization tool is constructed,which unifies the synchronization schemes of multi-source heterogeneous data and avoids the performance bottleneck of single-machine data synchronization.2)Propose a method for orchestration and optimization of big data hybrid task flow:The purpose is to organize different types of tasks in complex big data processing processes into a DAG-based hybrid task flow,so as to automate big data processing processes based on automatic scheduling of big data mixed task flows.3)Design and implement a one-stop big data integrated development platform:Constructed a web-based big data integrated development platform,provided a unified development environment for different big data technologies and processing frameworks,and provided one-stop data development capabilities for big data developers.This thesis finally implements a big data integration development platform for multi-scene,which can provide developers with full-link solutions from data generation to data synchronization,data storage,data processing,and finally to data consumption.The platform has been applied to the national key R&D project "Research and Development of Technology Consulting and Service Platform Based on Big Data",which verifies the validity and practicability of the platform and scheme.
Keywords/Search Tags:big data development, data synchronization, hybrid task flow, integrated development platform
PDF Full Text Request
Related items