Font Size: a A A

Research On Data Quality Assessment Of Multi-source Data Fusion

Posted on:2021-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:F JiangFull Text:PDF
GTID:2518306230978159Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,more and more application scenarios no longer use single-source data for characterization,but use multi-source data fusion for research and analysis to improve the comprehensiveness and accuracy of research results.Data quality assessment of multi-source data fusion will provide high-quality data for many fields.Traditional data quality assessment is generally based on single source data,resulting in a lack of a unified assessment system for quality assessment of multi-source data fusion.At present,unstructured data is growing rapidly,which contains great value.Its importance is generally recognized by people.For different application scenarios,unstructured data analysis techniques are different and difficult,so how to evaluate the quality of multi-source unstructured data fusion has become a challenge.Combined with the current background of big data,the quality assessment of multi-source data fusion is an important research topic in the field of data quality assessment.This paper studies the key techniques of data quality assessment of multi-source data fusion scenarios.First,the data quality problems of multi-source data fusion are analyzed.According to the actual needs in the data fusion scenario,a quality evaluation framework is constructed,which includes various quality dimensions,evaluation indicators,and evaluation models of various indicators.Secondly,for the evaluation of unstructured data quality,a topic relevance evaluation index is designed,and a novel quality evaluation method is proposed.This method uses the image description generation model based on deep learning and adopts natural language processing technology to establish a text similarity calculation model,which realizes the relevance evaluation work between multi-source unstructured data.Finally,since this paper proposes a multi-dimensional,multi-index evaluation system,an overall index evaluation method based on entropy weight method and 1?9 scale method is designed.In this paper,we use the multi-source location POI data set(structured data)in Kunming city,multi-platform(Baidu,Sina,Wechat)hot list data set and the sentiment analysis data set(unstructured data)in twitter.According to the characteristics of different data sets,we flexibly select different evaluation indicators in the evaluation framework for experiment analysis to verify the feasibility of the evaluation framework.The experimental results not only give the specific data quality scores,but also directly reflect data quality status,which shows that this study can provide a solution for the data quality assessment in the multi-source data fusion scenarios such as public opinion monitoring,urban hot spot mining,and the research results can provide practical support for association relationship mining between multi-source unstructured data.
Keywords/Search Tags:Data quality, Quality assessment framework, Unstructured data, Topic relevance
PDF Full Text Request
Related items