Font Size: a A A

Research And Application Of Data Warehouse Construction Method Based On Semantics

Posted on:2021-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2518306503473934Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The data warehouse has a good processing ability in data synthesis,classification and analysis.Therefore,integrating quality data in the component life cycle through data warehouse can provide support for component selection.However,establishing quality data warehouse for entire life cycle of component has the following problems: Firstly,it is difficult to express data uniformly.Because there are many departments in the component's life cycle,the scattered data are incomplete in their constituent elements.Without a unified data representation model,data transfer and exchange between multi-source data cannot be carried out.Secondly,the transformation of instance data is difficult.As the data of each department has the characteristics of heterogeneous semantics and massive amount,semantic technology should be employed to eliminate semantic heterogeneity of instance data before transforming them into data representation model.Meanwhile,high-efficiency transformation method should be adopted to transform massive instance data.In view of the above problems,this paper proposes a semantic-based data warehouse construction method.The purpose of this method is to construct a data metamodel through semantic technology,then the component data of each department can be uniformly represented.Then combined with semantic technology,the instance data of the component's entire life cycle is efficiently extracted,cleaned and loaded into the data warehouse.The data warehouse provides designers with reliable component-related quality information when selecting component.Through parameter matching and intelligent analysis,the components required by designers can be dynamically provided to avoid possible quality risks in the selection.The main contents of this paper are as follows:(1)Research on the construction method framework of data warehouse based on semanticsThis paper proposes a semantic-based data warehouse construction method framework,which includes a data metamodel building module,a data warehouse building module,and an application module.Data warehouse modeling,data warehouse instance data import,and component selection recommendation applications are implemented respectively.(2)Data metamodel construction for the entire life cycle of componentsThis paper proposes the construction of a data metamodel for the entire life cycle of components,which eliminates the semantic heterogeneity of data representation by component departments from the conceptual level,and solves the problem of unified representation of data.Firstly,business concepts and relationships are extracted from the business forms of components in their entire life cycle,and metadata is obtained through semantic fusion of business concepts.Then,metadata based data warehouse dimension modeling is carried out to form the data warehouse metamodel.(3)Construction of data warehouse based on semanticsIn this paper,data extraction,cleaning and transformation,loading and metamodel updating are studied.The main work of this article includes two aspects: on the one hand,it solves the problem of semantic heterogeneity of data.The method of data cleaning and transformation based on synonym dictionary plus rules is adopted to solve semantic heterogeneity of data from the instance level.The concept updating method of data warehouse metamodel based on child node matching is adopted to solve the semantic heterogeneity of data warehouse at the conceptual level when the data source is changed by updating the data warehouse metamodel.On the other hand,it solves the efficiency problem of transforming massive data into data warehouse.The incremental extraction method based on timestamp and log is adopted to solve the efficiency problem of data warehouse in mass data extraction.The data warehouse loading method oriented to real-time query is adopted to solve the problem of slow loading of massive data and low efficiency of real-time query for massive data in data warehouse.Finally,a data warehouse of component quality is built based on the Transwarp Data Hub platform,and the application of component selection and recommendation is verified to clarify the effectiveness of the method in this paper.
Keywords/Search Tags:Metadata, Data Warehouse, ETL, Component Selection Recommendation
PDF Full Text Request
Related items