Research And Implementation Of High Reliability Data Integration System Based On Cluster

Posted on:2017-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:H F Li

Full Text:PDF

GTID:2308330485988109

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the increasingly widespread applications of IT in enterprise, IT systems have being built and updated during the development of enterprises. Generally speaking, for the informationization process of government and large corporation, the construction of IT systems has the characteristics of distribution and periodicity, which caused the phenomenon of Information Silo. As a solution, Data Integration technology, also known as ETL technology, is responsible to logically and physically integrate data from various sources, format and characteristics. And then provides comprehensive data sharing for enterprises. After years of development, traditional DI technology has been widely applied in the area of Data Warehouse. In recent years, with the rise of Big Data and Cloud Computing technology, companies have become more and more dependent on data, and the resource of getting information become more various, such as mobile equipment, Internet, which led to growing attention on problem of integration of heterogeneous data.Existing DI architecture basically satisfied the demand of functionality and usability, but the demand of high efficiency, high reliability and expendability under the environment of big data have not been satisfied. Therefore, we concentrate on the concurrent processing of ETL workflow and transaction based ETL data dispose, and then reform it, at last propose the high reliability DI architecture based on cluster computing.The thesis is an extension of Intelligent Big Data Analysis System. According to the actual needs, we designed a high reliability DI architecture based on cluster computing, and use the open source stream processing framework Apache Storm dispose ETL data. And then we proposed the ETL workflow parallelizing method to solve the key technology of data caching. In order to satisfy the reliability requirements, we proposed an ETL data processing method based on transaction, designed a concurrency control protocol—process-commit, and solved the key technology including transaction coordination, transaction triggering and transaction state management. We studied and designed the mapping technology to make the logical ETL workflow running on date processing engine as ETL tasks. Finally, we have done a series of experiments to verify the correctness of the system.

Keywords/Search Tags:

Data Integration, ETL workflow, Storm cluster, Transaction, concurrency control

PDF Full Text Request

Related items

1	Lication Of The Long Transaction Processing Model Based On Wf Workflow Engine
2	Study On The Key Issues Of Database Cluster System
3	The Research And Implementation Of Pipeline Based Processing And Concurrency Control Of Distributed Transaction
4	The Concurrency Control On Web's Terrace And Its Application In ERP
5	Research Of The Long Transaction Handling Mechanism Based On The Workflow
6	Workflow Management System Research And Implementation Of The Transaction
7	Temporally correct algorithms for transaction concurrency control in distributed databases
8	The Research And Realization On Concurrency Control Of Distrbuted Workflow Management System
9	Optimization Of Workflow Transaction And Implementation
10	Research On The Transaction And The Concurrency Control Mechanism Of NXD Database