Font Size: a A A

Research And Implementation Of The Application Migration From Traditional IOE Architecture To The Hadoop Cloud Platform

Posted on:2016-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:C L LiuFull Text:PDF
GTID:2298330467991786Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the coming of the era of "big data", the data scale is always growing in enterprises. Enterprises processed these data on the traditional IOE architecture (composed with IBM minicomputer, Oracle and EMC storage) in the past. While at present, it cannot satisfy the requirement because of the10bottleneck of disk, computing capability bottleneck, and the bandwidth bottleneck occurring while data grows to a specific extend. At the same time, Hadoop is a perfect replacement solution for the IOE architecture in this situation. It takes the HDFS as the storage management solution. It distributes the computing tasks with the framework of MapReduce. As well, it shifts the pressure of network bandwidth caused by computing level to the data level. When enterprises want to choose Hadoop as their data processing tools’platform, they are supposed to migrate the applications which run on the IOE platform to the new platform.In this paper, we put forward an ETL applications’migration solution from the IOE platform to the Hadoop platform, and apply it to a real company’s project. The main research points in this paper is list as following.(1) The migration of ETL applications. It involves developing programs to transmit data from the relational database table to the Hive data warehouse, developing programs to transmit log files from the server to the HDFS, developing programs resolving the structured configuration XML files exported from the ETL tool of IBM WebSphere DataStage and transform it to Hive scripts, developing programs resolving the semi structured functionality configuration files generated by the ETL tool of E-transform and transform it to MapReduce serialized files, developing programs of resolving Hive scripts.(2) The optimization of the Hadoop cloud platform and Hive applications. It involves the research of configuration rules of parameters of Linux and parameters of Hadoop, developing more efficient Hive ql clause, comparing them with the original ones and analyze the principle, comparing the performance of applications before and after the migration.(3) The design and implementation of single Hadoop Job monitoring programs. Collect information of Jobs through calling Hadoop APIs, then list it. As well, visualize the use of resources for Jobs.
Keywords/Search Tags:IOE, Hadoop, application migration, optimization, monitoring
PDF Full Text Request
Related items