Research And Implementation Of Big Data Platform For Cable Manufacturing Based On Spark

Posted on:2020-09-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y Li

Full Text:PDF

GTID:2428330596476614

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In the era of Industry 4.0,the global manufacturing industry has developed rapidly,and the data generated by the manufacturing process has also been exponentially increasing.Enterprises are also increasingly paying attention to the use of these data.How to effectively use industrial big data to promote the network and intelligence of enterprise manufacturing is one of the challenges facing manufacturers today.This article is the design,construction and application of cable big data platform for making better use of big data in manufacturing.Through the distributed system architecture Hadoop,big data processing framework Spark built big data platform and improved K-Means clustering algorithm,the big data platform data processing scheme is designed and implemented.The main work done in this paper is as follows.First,research on big data platform technology and the construction of big data platform.In this paper,by studying the advantages and disadvantages of Hadoop and Spark technology on big data platform,this paper designs a Spark on Yarn Cable Data Platform(SCP)based on Spark on Yarn mode.The platform is stable and high.Reliable,fast,low maintenance costs,and high resource utilization.And through Flume integration Kafka built a scalable,fault-tolerant,fast and stable data collection system.Second,research on big data processing technology,research and improvement of K-Means algorithm.Spark Streaming,the core composition technology of Spark,was studied and the data was cleaned up.At the same time,Spark MLlib,another core composition technology of Spark,is discussed,and the clustering algorithm supported by it is deeply analyzed.The implementation principle of K-Means algorithm is analyzed.And using the clustering algorithm to be vulnerable to outliers and difficult to determine the K value,and the advantages of dealing with Gaussian distribution data,using feature scaling,detecting and deleting outliers,optimal clustering center,dimensionality reduction,etc.Optimized and improved the algorithm.Third,data acquisition,data processing and data analysis based on the cable industry big data platform SCP.Firstly,through the data acquisition system,the original production data is collected,and then the data is output to the big data platform to complete the data cleaning process through Spark Streaming.Finally,the K-Means algorithm based on MLlib is used to analyze the data after cleaning and explore the data.Clustering and associations.Provide guidance for enterprise equipment parameter adjustment and active maintenance,and also provide reference for subsequent in-depth research.Through the research in this paper,it is found that the industrial big data platform built by Spark and Hadoop can realize the analysis and processing of massive data quickly and stably.The Spark interface can easily expand the processing power of the big data platform and enrich the application scenarios of the big data platform.

Keywords/Search Tags:

Spark, Big Data platform, K-Means algorithm, Data acquisition

PDF Full Text Request

Related items

1	Optimized Design And Implementation Of K-means Algorithm Based On Big Data Spark Platform
2	Research And Application Of K-means++ Algorithm Based On Spark Platform
3	Research On Spark Oriented Fuzzy C-means Clustering Algorithm
4	Design And Implementation Of Telecom 4G Big Data Platform For Network Optimization Based On Spark
5	Research On Parallelization Of Data Mining Algorithm Based On Distributed Platforms Spark And YARN
6	Research And Implementation Of Data Hybrid Computing Platform Based On Spark
7	Parallelizing K-means-based Clustering On Spark
8	Research On Load Allocation Strategy Based On Data Clustering
9	Research And Application Of K-means Algorithm Based On Density And Distance
10	Research And Implementation Of Unified Large Data Mining Service Platform Based On Spark MLlib