Font Size: a A A

Research Of Mixed Type Data Management Based On Distributed Database Middleware

Posted on:2018-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:H XueFull Text:PDF
GTID:2348330536452505Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,the infiltration and development of information technology in the human society have resulted in a large amount of data appearing in the enterprise,science,the Internet and other fields.As a result,the big data era is coming.Nowadays,more and more applications involve the storage and query processing of big data,which gradually show their scientific and commercial value.However,the rapid growth of data size and complexity of big data pose a huge challenge to existing data management technologies.With the rapid growth of massive heterogeneous data,the limitations of centralized database in storage and computing are getting more and more obvious and the distributed data management has become a general trend.Distributed database middleware provides users with a transparent solution of establishing database clustering,and provides simple and convenient distributed support for open source relational databases such as My SQL,Post Gre SQL and so on.In fact,the distributed database middleware is able to integrate different types of databases in the bottom and applications in the upper layers.To this end,if the underlying relational database and No SQL database are integrated in unified ways,it is expected to conduct adaptive storage and query optimization to data of different structures from different sources,thus realizing unified management of multi-source heterogeneous data.First of all,the thesis describes the characteristics of big data and its multi-source heterogeneity,points out the defects of using a single kind of database in the context of multi-source and heterogeneous big data,and shows the necessity of using multiple types of databases at the same time.On the other hand,the thesis gives the details of the concept,principle,characteristic and representative products of distributed middleware,discusses the possibility of using the distributed database middleware to manage the mixed data,and analyzes thet defects and deficiencies of existing distributed middleware in supporting different types of data.On this basis,the thesis proposes a hybrid data management framework based on distributed database middleware and introduce the architecture of the framework.In addition,the thesis designs a hybrid data query mechanism based on distributed database middleware.For the widely existing semi-structured-relational mixed data and unstructured-relational mixed data,the thesis designs query mechanisms based on Mongo DB-My SQL and Hadoop-Hana.The thesis designs SQL query statement to provide a unified upper query interface,implements new query parsings,query interceptions and query push functions in open source distributed database middleware My Cat,and further designs query algorithms for mixed type data to implement the query mechanism.Finally,the thesis uses the actual datasets downloaded from a medical consulting website and TPC-H benchmark datasets to verify the function and performance of the proposed framework and algorithms.The experiment results show the effectiveness of the proposed methods and verify that the proposed methods are effective attempts for multi-source heterogeneous data management.
Keywords/Search Tags:Big Data, Distributed Database Middleware, Structured Data, Unstructured Data
PDF Full Text Request
Related items