Font Size: a A A

Platform Design And Massive Data Processing And Implementation Based On Mobile Business Services

Posted on:2021-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:S L PuFull Text:PDF
GTID:2518306569990029Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
At present,the big data industry is developing rapidly and has been applied in all walks of life.Finance,education,industry,medical treatment and even the current epidemic prevention work have played an extremely important role.In this era of rapid changes in big data technology,the Internet of things,the Internet and 5g intelligent terminals have penetrated into every corner of life.There are terminals everywhere,things can be sensed and all the time Internet access,with the rapid expansion of production data,the interaction of massive data has formed a close and efficient network,which contains the secrets of life dynamics and behavior.If you want to understand the secret,big data technology has become an indispensable key.Up to now,more and more enterprises,institutions and organizations have realized the role of big data,and began to seek a way to dig out the huge potential value contained in the data.So,back to the source of data,take operators as an example,in the face of accumulated data for more than ten years and hundreds of millions of user information in the country,how to effectively establish a big data platform and make effective use of the existing data has become a difficult problem for many people.In order to facilitate the improvement of the storage capacity and computing efficiency of the data analysis environment,and facilitate the use and analysis of data,this paper proposes a Cloudera Hadoop cluster based on the collection,storage,processing,analysis,and evaluation of multi-dimensional data in multiple domains.The paper has conducted research in the following aspects:(1)Starting from the background of the topic,this paper studies and explores the common processing methods and development direction of data platform at home and abroad from the data collection,processing,business architecture,processing and background.Taking the operator as an example,this paper determines the demand outline of the data management system for different types of data(B domain,O domain,signaling,comprehensive mining)and different dimensions of data(database,compressed file,log),including: how to collect,process,process and analyze the business process,and finally obtains the data table provided for products and customers,and focuses on the design flow The idea of data asset management involved in the process.(2)The core module of data management system is explored in detail,which is data acquisition,detailed processing and data realization of massive relational data through spark.Analysis of the core module business requirements,common solutions,difficulties at this stage,processing points and other parts,for the subsequent detailed design to do a close bedding.(3)Integrate the business design of the platform with the technology stack(such as Hadoop)2.X,hive,spark,Flink,etc.)are closely combined to make Map Reduce(hive SQL)program and spark(Python)play their respective roles and performance.Finally,the data processing flows corresponding to different modules and models are constructed in the middle platform according to their respective business classification,and the data processing scripts are uniformly scheduled and integrated by the overall scheduling BDI process.Kafka,flume,sqoop,zookeeper and HBase form a complete big data processing platform.(4)The data processing platform in this paper is cloudera Hadoop cluster.In order to process different data(select one cluster as an example),600-1000 physical machines need to be built into a unified cluster,This paper discusses the monitoring requirements of this type of huge cluster and the characteristics and structure of Prometheus and alertmanager,and describes the process,key points,common errors and platform architecture in the process of building cluster.The final research result at the end of the article is a complete architecture from data collection,preprocessing,processing to analysis results,operation and maintenance monitoring is a distributed data processing monitoring system based on Hadoop 2.X(Cloudura version),which supports the overall The collection,processing,analysis of the data platform and the monitoring and verification of the complete data stream corresponding to the product output.
Keywords/Search Tags:data processing, platform design, Spark Streaming, tools integration, data monitoring
PDF Full Text Request
Related items