Font Size: a A A

Design And Implementation Of A Data Sharing Platform For Coal Enterprise Based On Hadoop

Posted on:2016-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2309330509450906Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Coal accounts for a large proportion in China’s energy consumption, and large state-owned coal enterprises in coal production in our country hold the dominant position, and made outstanding contributions to the protection of China’s energy security, economic development and social stability. In order to improve the production efficiency, reduce operating costs, to prevent safety accidents in coal mines, coal enterprises have established their own information system. But because of the lack of rational planning in the building, the data can not be shared between different information systems within the enterprise, formed many "information isolated island". Some coal enterprises to break the "information isolated island", gradually establish a data sharing platform, but these data sharing platform can not meet the needs of the current information of massive data processing system in coal enterprises. Hadoop is a distributed system architecture based on cluster, in cheap machine to provide high-speed computing and mass storage. Hadoop makes the operation more convenient and processing of large data, and provides an effective way to solve the problems of the coal enterprise data center.This paper through the research and analysis of large data faced by coal enterprise data center to deal with the problem of the coal enterprise, data sharing model is established, the model for the establishment of a data warehouse in Hadoop, from the source database in accordance with the requirements of the extracted data through data integration is stored in the data warehouse, outward through the unified data interface data access and data analysis,to provide services for users. Secondly, according to the actual needs of a coal enterprise,production data in the coal enterprise as an example, the design of data sharing platform.Completed the design of the Hadoop platform and server data model. The process of data extraction based on Sqoop is discussed in this paper. The definition of data cleaning and transformation requirements and methods. Details of the parallel FP-Growth algorithm application, and the design of a few simple examples of application. Finally, the function of the platform to do everything in one’s power implementation. According to the design requirements, through the Sqoop to extract data from the source database into the data warehouse, Hive. Hive and Eclipse integration, under the Eclipse programming, data cleaning and loading. Parallel FP-Growth method using Eclipse to call Mahout for data analysis, and the results stored in the database. The results of data processing into the database platform,and based on the Spring framework, completed the development of the sample application on the server.Data sharing platform for coal enterprises based on Hadoop, using the Hadoop platform using the mature open source technologies, has powerful processing ability and high stability,not only improve the efficiency of data sharing, and can meet the demand of the analysis and processing of the data, but also reduce the operation cost of the enterprise data center. The platform can effectively solve the ubiquitous information system of coal enterprise of the "information isolated island" and the problem of massive data, the coal enterprises can function using this platform further development based on the specific, to provide help for the production and operation of enterprises.
Keywords/Search Tags:Islands of information, mass data, data sharing platform, Hadoop
PDF Full Text Request
Related items