Research On Key Technologies Of Multi-Source Heterogeneous Data Fusion

Posted on:2021-05-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Feng

Full Text:PDF

GTID:1368330605981265

Subject:Software engineering

Abstract/Summary:

With the rapid development of big data technology,data fusion,which is based on machine learning theory and supported by sensing data,has become a hot research field and has been widely used in various smart city systems,such as smart healthcare,smart home,and smart transportation,etc.With the increasing amount of sensing data,the differences in data types,data relationships,and data quality are increasing.Besides,there are a lot of unlabeled data,sparse data areas,and domain knowledge.Furthermore,the problem of distributed multi-source heterogeneous data fusion caused by data privacy,data security,and transmission restrictions cannot be ignored.In this thesis,four key problems of multi-source heterogeneous data fusion,including single model data fusion,structured data fusion,cross-domain knowledge fusion and data fusion in the distributed environment,are studied and explored.The proposed methods are verified based on real-world data.The results achieved are as follows:1.To solve the problems of multi-source heterogeneous data fusion,this thesis proposes an algorithm based on random forest,called MCS-RF.The proposed algorithm is a single model,which combines offline semi-supervised random forest and online semi-supervised random forest.The proposed algorithm can solve the problems caused by heterogeneous,sparse and unlabeled data in unstructured multi-source heterogeneous data fusion.To verify the effectiveness of the proposed algorithm,fine-grained PM2.5 real-time inference in Beijing is taken as an example.The experimental results show that MCS-RF can effectively fuse multi-source heterogeneous data and improve the inference accuracy.2.To solve the problems of multi-source heterogeneous data fusion,this thesis proposes a multi-source heterogeneous data fusion algorithm based on ensemble learning.Different from MCS-RF,the proposed algorithm completes data training by constructing multiple independent sub-models.This algorithm analyzes and models data features such as time-series attributes,spatial topology,and real-time data that are often found in urban sensing data.The ensemble of sub-models is achieved through a neural network.To verify the effectiveness of the proposed algorithm,the fine-grained air quality estimation in Beijing is achieved based on urban sensing data.The experimental results show that the proposed algorithm can effectively utilize the features of multi-source heterogeneous data and improve the inference accuracy.3.To solve the problem of cross-domain knowledge and data fusion,this thesis proposes a cross-domain knowledge fusion algorithm based on machine learning.This algorithm approximates the domain knowledge model,and uses the data to train and solve the approximate model parameters,so as to solve the deployment problem of the domain knowledge model in urban sensing data.This thesis takes air quality prediction as an example to verify the effectiveness of the proposed algorithm.The experimental results show that the proposed machine learning based cross-domain knowledge fusion algorithm can effectively utilize the cross-domain knowledge and improve the accuracy of prediction.4.To solve the problem of data fusion in the fog computing environment,this paper proposes a multi-source heterogeneous data fusion mechanism including the local heterogeneous data fusion system and the centralized homogeneous data training system.The proposed mechanism uses a parameter averaging method based on data volume and data quality to iteratively optimize the model.This paper takes the environmental monitoring problem in the fog computing environment as an example to verify the effectiveness of the proposed mechanism.In the experiment,urban sensing data is divided into simulated data distributions in a fog computing environment,and the proposed mechanism is verified on the Independent Identically Distributed(ⅡD)data and non-ⅡD data.Experimental results show that the proposed mechanism achieves high-precision model training without data sharing,which can solve the problems of data sparsity,model overfitting,data heterogeneity,and model heterogeneity,etc.

Keywords/Search Tags:

big data, multi-source heterogeneous data fusion, knowledge fusion, neural network, machine learning

Related items

1	Research On Heterogeneous Knowledge Fusion Methods In Big Data Environment
2	Research On Knowledge Graph Completion Based On Multi-source Heterogeneous Information Fusion
3	Research On Multi-source Data Fusion For The Question And Answer Of Subject Knowledge
4	Multi-source Sensor Data Fusion And Its Applications In The Target Detection
5	Knowledge Graph Construction And Application For Enterprise Based On Multi-source And Heterogeneous Data
6	Research On Urban Multi-source Heterogenous Data Fusion Methods And Applications
7	Research On Financial Time Series Forecasting Based On Multi-source Fusion Data
8	Research On Heterogeneous Multisource Multimodal Data Fusion Based On Digital Twin
9	Heterogeneous Fusion Of Multi-Source Atmospheric Bata And Analysis Of Apatio-Temporal Data Mining
10	Multi-source Knowledge Base Fusion And Application For Heterogeneous Data Sources