Font Size: a A A

The Design And Implementation Of USER And LSTG Part Of EBay Hadoop Migration System

Posted on:2015-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:X PanFull Text:PDF
GTID:2308330461960692Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the improvements of large data storing and processing technologies, many global enterprises have begun to build the data warehouses to gather all the data separated in different sites for analysis and decisions. As one of them, eBay has already built its own data warehouse system. However, the now Teradata based system shows the bottleneck, while dealing with the larger data storing and busier data processing tasks. Also, it’s not easy to extend because of the costs. To solve this problem, the eBay Data platform Department raised a new data warehouse plan based on the Hadoop framework. This plan will base on the Hadoop and Cascading, having a new style for data loading, processing and storing, to build a new eBay data warehouse system, i.e., the eBay Hadoop Migration System.The USER and LSTG part in the paper is two of the parts in the system. In the design and implementation of these two parts, we raised a new way of data processing. We adjusted and aggregated some steps in the former system, decreased some unnecessary steps. For the detail logic, we used the pipes and functions of Cascading to do join, update, dedup, etc. We also wrote some classes for common data processing actions and put them to a common module. When do the transformation for some data, we considered the properties of Hadoop environment, do the equivalent transformation of the former process steps based on Cascading. What’s more, we’ve raised a jobplan in the scheduling system by using some special jobs.Compared with the processes in the former system, the plan raised in the paper is appropriate for the Hadoop environment, also simpler in the processing flow. For the detail logic, the Cascading based logic has less unnecessary steps and is simplify. The processing time of the whole flow is also shortened.After the plan putting into practice, the data storing and data processing for the USER and LSTG part will be set, and also providing a data access for the analysts. The implementation of the two parts will release some resources in the Teradata based system and relieve the lack of system available resources. What’s more, it will be a good trial and experience about using Hadoop based technologies to the enterprise data warehouse building and the ETL process developing.
Keywords/Search Tags:Data Warehouse, Data Process, ETL, Hadoop, Cascading
PDF Full Text Request
Related items