Font Size: a A A

A Semantic Description And Data Query Oriented Big Data Organization Method And Researches On Key Application Technologies

Posted on:2019-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:1368330611967032Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The popularization and further development of technologies such as big data and internet of things have led to the rapid expansion of data not only in terms of volume,but also in the type and format of data.Due to the variety of data schema and modes of operation,various data forms a large number of independent data populations,and it is impossible to carry out uniform query and processing,which hinders the interoperability of data.This not only makes the unified and efficient use of all kinds of data a difficult task,but also brings great challenges to how to fully mine valuable information from these vast amounts of data.However,it is more and more difficult to meet current application needs by storing and manipulating these heterogeneous data in traditional methods.For example,big data models such as No SQL often do not have fixed patterns and data structures are often dynamically changing,which is the main obstacle to integrate with traditional data.At present,the research on pattern integration of big data and traditional data model is still not enough,and the big data semantic description and other issues are still lack of comprehensive research.Therefore,there is a need for a unified,efficient and flexible enough way to describe all kinds of heterogeneous data,and to express the semantics inside the data and among heterogeneous data so as to discover the intrinsic value of the data and potential knowledge.Based on the full comparison and analysis of various public models for heterogeneous data integration,this thesis absorbs the characteristics and advantages of related major models and proposes a concept-and-relation-oriented public data model called GDM(i.e.,Grid Data model).Founded on the definitions of Relation,Paragraph and Section,GDM realizes a new approach of data schema definition and structure organization,which is capable of describing all kinds of data structures and semantic relations.In the meantime,the formal standard definition of GDM model is offered in this thesis.In order to conduct deeper explanation about the semantics description and logical reasoning ability of GDM model,this dissertation illustrates the principles of semantic reasoning and domain knowledge evolution with GDM model based on GDM basic concepts,and by taking()description logic as an example,also describes how to establish the mapping relations with description logic through GDM grammar subset,as well as how to build up domain knowledge base with GDM model based on description logic.Then nessesary theoretical proofs on relevant reasoning problems of GDM model are carried out.After that,this thesis studies the issue of data structure heterogeneity in data integration.In order to realize the integration of all kinds of traditional data model and big data model,this thesis makes use of the relation-oriented data structure description mechanism of GDM model,and researches the principles of schema transformation from various data models to GDM from the perspective of formal theory,including the structured relational model,semi-structured XML and a variety of unstructured No SQL data models.At the same time,it is also studied that the GDM model can simultaneously describe the hybrid schema characteristics of the schemaful data and the schemaless data,as well as the capability of dynamically modifying data.This dissertation then defines GDM model algebra and the syntax of the query language GDM SQL,and explains the basic principles of the GDM data query process and query optimization.The above GDM model data management scheme provides basic methods of Gri Data query and operation,which is a necessary prerequisite for heterogeneous data integration with GDM model.On the basis of above model definition,relevant theories and query operation language,this thesis studies several aspects of query,processing and optimization in heterogeneous data integration process in distributed environment and solves the related problems,such as query variable association,query decomposition and query plan generation,and the parallel scheduling of query process.Meanwhile,to reduce the time cost of heterogeneous data query processing,this dissertation also proposed several query optimization schemes based on the minimum scheduling connected graph,and conducted comparisons on the performances of various optimization strategies through simulation experiments to verify the effectiveness of the query optimization method.In order to further explain the excellent characteristics of GDM model and the advantages of data integration efficiency,this dissertation also compares the relative characteristics of GDM and several basic data models from various aspects and focuses on the in-depth comparison with OWL model.In the meantime,based on the efficiency evaluation model proposed in this thesis,the time and space efficiencies of each model in data creation,modification and deletion are comparatively analyzed from the perspectives of time and space.The results show that GDM achieves optimal performance in terms of time and space efficiencies in data integration from a comprehensive perspective,and is very suitable for heterogeneous data integration.In the end,this thesis designs a heterogeneous data integration system based on the GDM model,introduces the design framework and implementation process of the system,shows the operation of the system,and verifies the feasibility and effectiveness of the relevant theory proposed in this dissertation.It shows that GDM model can perform well enough in data integration and knowledge discovery in distributed and heterogeneous environments.
Keywords/Search Tags:GDM model, logic deduction, schema transformation, heterogeneous big data integration, query processing and optimization, integration efficiency
PDF Full Text Request
Related items