Font Size: a A A

Research On Schema Mapping Method Based On Domain Constraint

Posted on:2022-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2518306575959599Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development and popularization of Internet technology,more and more companies begin to use the Internet for office work,which results in a large amount of business data being transmitted through the Internet every day.At the same time,the structure and source of Web page data have also been continuously enriched.How to integrate these different structures and different sources of data according to a unified standard to provide relevant enterprise decision-makers with a basis for analysis and decision-making is gradually becoming a hot topic in the field of big data research.Data exchange is one of the important research contents of Web heterogeneous big data integration,which usually includes two aspects of research at the instance level and the model level.This thesis mainly focuses on the scheme layer.At present,there are still many problems to be solved in the research of scheme mapping.On the one hand,due to the formation of most of the initial set of mapping rules are usually converted by experts familiar with the data associated semantics designed.So how can a non-expert users in conjunction with the premise of your own domain knowledge to design a set of rules for initial mapping areas of demand,is a map of the current area of study models the main problem.On the other hand,how to select a mapping rule set that is more suitable for the source mode and the target mode on the premise that the initial mapping rule set already exists is also a research emphasis in the current scheme mapping research field.In terms of the generation of the initial rule set of scheme mapping,this thesis puts forward an improved method of generating and optimizing the initial mapping rule set based on the tuple of user examples based on the actual real estate big data research project.In this method,some representative data instances and mapping rule sets initially provided by non-expert users are preprocessed first,and then atomic optimization and connection optimization are performed on the processed normalized rule set.Finally,based on the domain knowledge of non-expert users,the validity of the initial example tuple is recursively Boolean query through simple user interaction,so as to obtain the initial mapping rule set that meets the user's needs.In terms of the selection and optimization of the mapping rule set,based on the research of the initial mapping rule set generation,this thesis realizes a probability-based scheme mapping selection and optimization methods by combining previous related research results and the homomorphic relationship between data instances.This method first defines a cost function for the mapping rule without existing quantifiers,and then expands the function to adapt to any mapping rule,and finally uses probabilistic soft logic(PSL)related theories to perform the extended cost function optimization,which is used to obtain the optimal solution for myopia,then the automatic selection and reasoning of the mapping rules are realized.Finally,this thesis uses actual Web real estate big data sets to verify the above two methods respectively,and verify the effectiveness and superiority of the method in terms of mapping space exploration strategy selection,rule selection rate and running time.
Keywords/Search Tags:Web data exchange, Schema mapping, Homomorphism, Boolean query, Probabilistic soft logic
PDF Full Text Request
Related items