Font Size: a A A

Research On ETL System Of Data Warehouse Based On Crowdsourcing

Posted on:2020-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2428330575959879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
ETL is the entry point of data acquisition in data warehouse,and it is the key link to determine the quality of data warehouse'In the current construction of data warehouse,it is difficult to meet the needs of building data warehouse by traditional ETL system,such as a wide range of data sources,a large number of data,deviation,and more uncertain and incomplete data,etc.The traditional data warehouse system is difficult to meet the needs of the enterprise to build a data warehouse.This paper uses the theory and method of network crowdsourcing,combines the traditional data warehouse ETL method with the network crowdsourcing method,and studies and constructs the data warehouse ETL system based on the crowdsourcing.The main work and achievements of this paper are as follows.Firstly,the defects and shortcomings of the traditional data warehouse ETL system are analyzed.Facing the present situation of big data,such as a wide range of data sources,a large number of data types,and more uncertain and incomplete data,the traditional data warehouse ETL system lacks effective means of processing.Data processing needs to be assisted by manual knowledge and methods.Secondly,a general architecture of data warehouse ETL system based on crowdsourcing is designed.Based on crowdsourcing theory and traditional ETL system architecture,a data warehouse ETL system architecture and software platform based on crowdsourcing and supporting the combination of manual and machine for data extraction,transformation,loading and processing are designed and established;Thirdly,a multisourcing-based ETL uncertain data processing language and crowdsourcing evaluation control algorithm are designed.The standard SQL statements are simply extended to meet the needs of ETL process crowdsourcing.An adaptive system processing extension language for ETL process of data warehouse based on crowdsourcing is designed.The matrix-based algorithm and the hierarchical response model are used to evaluate the tasks of crowdsourcing and control the results of crowdsourcing.Fourthly,under the background of oil drilling material supply management,a prototype ETL system of data warehouse based on crowdsourcing is developed,which realizes the effective processing of data in the process of establishing data warehouse of drilling materials.The traditional ETL data processing mode is changed,and the efficiency and quality of ETL data processing are further improved.
Keywords/Search Tags:Crowdsourcing, Data Warehouse, ETL, Prototype System
PDF Full Text Request
Related items