Font Size: a A A

Design And Implementation Of Data Aggregation System For Isomerism And Heterogeneous Data

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2518306332468554Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of big data technology and application,the access data of big data platform presents the characteristics of massive,isomerism,heterogeneous,and streaming."isomerism data"refers to a variety of data with different data formats,while "heterogeneous data" refers to a variety of data with uneven data quality.Under this background,building "data aggregation system" to access,preprocess and distribute isomerism and heterogeneous streaming data has become a hot spot in the industry.The purpose of this project is to design and implement a data aggregation system for isomerism and heterogeneous data.The system implements data access and distribution functions,data preprocessing functions and system management functions.Among them,data access realizes the adaptive access to different data structures from multiple sources.Data preprocessing realizes the format checking and cleaning of access data.Data distribution realizes the distribution of data to multiple applications.To solve the problem of diverse data sources and flexible data formats,a unified data format description is introduced,which enables the unified representation and flexible access of heterogeneous data formats from multiple sources.To solve the problem of the heterogeneity of streaming data,an incremental streaming data cleaning algorithm based on GAN model(SDC-IGAN)is proposed.Multi-layer LSTM is set as the generator and discriminator of GAN to handle the time series relationship between streaming data points,and the recognition and repair of outliers of time series data is achieved.To solve the conceptual drift of streaming data in SDC-IGAN,an online incremental learning strategy is designed to recognize and repair outliers of online real-time streaming data.Compared with the existing methods,this method achieves better accuracy in streaming data scenarios and effectively overcomes catastrophic forgetting of online incremental learning.Firstly,in this thesis,the research background of data aggregation system for isomerism and heterogeneous data is introduced.Then,based on the investigation of several data aggregation systems in the industry,the requirements of data aggregation system for isomerism and heterogeneous data is analyzed.Then a unified representation of isomerism streaming data and a GAN based incremental streaming data cleaning method are proposed.Then,the design and implementation of data aggregation system for isomerism and heterogeneous data are introduced in detail.Finally,a series of tests verify the validity of the system.
Keywords/Search Tags:data aggregation, isomerism and heterogeneous, data cleaning, incremental learning
PDF Full Text Request
Related items