Font Size: a A A

Research And Implementation Of Industrial Big Data Processing Platform Based On YARN

Posted on:2016-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:C G FengFull Text:PDF
GTID:2348330488472935Subject:Mechanical Manufacturing and Automation
Abstract/Summary:PDF Full Text Request
With the extensive application of digital technology, networking and virtualization technologies, industrial enterprise informatization and intelligent degree has been significantly improved, which produces a structured, semi-structured, and unstructured industrial data also showed exponential growth. enterprises have accumulated vast amounts of industrial data, but also produced a mining valuable information requirement from these massive data. Although the decision makers have realized that these large data contains a huge economic value, but still lacks advanced technologies and methods to manage and analyze these big data. Therefore, to design and implement a big data processing platform, which can store and manage massive data of enterprise production and manufacturing, maximum dig out potential hidden valuable information of the data and promote the transformation of enterprise development from business-driven to data-driven intelligent manufacturing model, has a far-reaching significance.The main characteristics of industrial data are large scale, high real-time, diversity of data types, strong dispersion and low value density, Traditional data management and analytical platform have unable to meet the needs of the demands of industrial data analysis and Application. Thus it is a must thing to study on new and effective industrial big data analyze platform. This paper design and implement a base platform to storage, manage and analysis industrial big data,Main work and research results are as follows:Research current mainstream big data storage and management technology, solve data scale problem of industrial big data in using distributed file system HDFS and NOSQL database technology. Design a multilevel storage system for multi-source heterogeneous of industrial big data, solve the requirement so different date types for storage model, and allow users to access and manage industrial big data by providing a unified data adapter. Research different models of Big data computation, use Map Reduce to solve Industrial big data offline batch computing use Spark to solve Industrial big data fast iterative computing use Strom to solve industrial big data flow computing, and integrate this three computing model based on YARN, to meet different business applications for data analysis of the timeliness requirements, by share cluster model. Use Dominant Resource Fairness to solve the fair allocation problems of resources among multiple computing models in the platform.Use the open source Apache HUE technology to provide user with an interactive visualization of large data analysis interface, allows user to conveniently submit Map Reduce applications, Hive Sql command, Spark applications and interactive query and data analysis results show and other functions, meanwhile combine R language with this platform to provide users a multi-language application development environment. For single point of failure exist in master-slave architecture, use the primary node hot backup mechanism to achieve high availability of this platform. In order to slove the problem of data scale and efficiency faced by the traditional data mining algorithms accomplished multiple data mining algorithms multiple model and parallel implementation based on this platform to provide a data mining algorithm library for industrial data's efficient analysis and processing.
Keywords/Search Tags:Industrial big data, YARN, Multilevel Storage, Batching Processing, Streaming Processing, Data Mining
PDF Full Text Request
Related items