Font Size: a A A

Design And Implementation Of Big Data Resource Sharing Platform Based On Spark

Posted on:2020-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:P Z RenFull Text:PDF
GTID:2428330575995171Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rise of mobile internet,the users of telecom operators has increased dramatically,and the number of users has reached 100 million.The monthly data usage of a telecom operator Department reaches the level of PB,and more and more departments within the operator also find the application value brought by the huge amount of data.Unicom's old Spark big data platform only provides support for one business,and all kinds of operations must interact through the command line.It is not friendly to new users,and the cost of learning is relatively high.If the number of users increases,resource management confusion will arise.In order to support big data operations of many departments,the original Spark platform needs to be upgraded to multi-tenant resource sharing platform,so the development of Spark big data sharing platform project is established.In the process of project development,the author first participated in the feasibility analysis and demand analysis of the project.The author analyses the various requirements of the platform and establishes the overall objectives of the project.According to the platform requirements,the platform is divided into data warehouse management module,computing task management module,memory file management module,platform monitoring module and user management module.Then in the outline design,the author designs the overall framework,execution process and database tables of the platform based on demand analysis,and draws the platform architecture diagram,module hierarchy diagram and database entity connection diagram.Based on the outline design,the platform is designed in detail.The author uses Spark computing engine,Hive data warehouse,MySQL database,InfluxDB timing library and Akka toolkit to design and develop the modules of the platform.Aiming at the resource consumption of data warehouse management module and computing task management module,the module is designed as a service cluster to improve the performance and scalability of services.In the design of cluster,the author investigates various load balancing algorithms,and chooses appropriate algorithms to achieve task load balancing according to the resource needed by the task.In the data warehouse management module,the author designs and implements a distributed data warehouse connection pool,which divides the connection pool into long connection and short connection,which improves the connection speed of data warehouse and ensures the priority execution of short SQL.In the development of platform monitoring module,the author uses InfluxDB sequential inventory to store the monitoring data collected by each acquisition component to ensure the response speed of the program.When the monitoring module monitors the anomaly,it sends the information to the anomaly processing component.Components make different processing according to the type of anomaly.Managers do not need to keep in touch with the operation of the platform at any time.When the platform has anomaly,they will automatically notify the managers,which reduces the cost of operation and maintenance.After multi-tenant upgrade and optimization,the platform brings great convenience to users,and users can focus on business logic implementation.The platform will allocate resources according to users'needs and visually show the operation of tasks.The new platform has passed functional and non-functional tests and reached the online standard,waiting for deployment to go online.
Keywords/Search Tags:Spark, Big Data Platform, Akka, Multi-tenant, Cluster, Distributed
PDF Full Text Request
Related items