Font Size: a A A

Query Optimization In Distributed Database Middleware

Posted on:2017-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:W YeFull Text:PDF
GTID:2308330503453769Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the explosive increase of information, the scales of the data to be processed become large and large. The shortage of storage and computing in traditional database is become more and more remarkable. Distributed structure has become an inevitable trend of data management. In order to provide distributed support to the MySQL、PostGreSQL and other widely used open source database, Amoeba、Cobar、MyCat and a series of distribute database middleware came into being. These middleware provides a solution for user to build a distributed database cluster and migrate stand-alone database and applications to the cloud. It will become to an important solution for distributed data management. However, the distributed database middleware is still not perfect now. The performance of middleware is not good at complex query such as join, aggregate operation and the repeat query on the same table. Therefore, this thesis pays more attention on how improve query efficiency on distributed database middleware.Firstly, this thesis describes the concept of distributed database middleware and the development of distributed data query optimization in domestic and abroad. Base on the query mode of distributed database middleware, this thesis analysis the key issues such as the method and target of distributed database middleware query optimization.Then, in order to reduce the data transmission in query process, the thesis proposed a distributed data partitioning method based on relation. Its main strategy is to construct a data set dependent model graph. Then the data is partitioned according to the primary key of the fact table. The other dimension tables is partitioned refer to the join column on fact table. The partition as far as possible to ensure that all query data is localized. The thesis also makes a detailed analysis of the storage, query and so on. And in the open source distributed database middleware MyCat, the thesis uses the TPC-H data set and SSB data set to experiment the method proposed in this paper. By comparison and analysis, the method of this thesis can effectively reduce the data transmission in the distributed database middleware, and improve the query efficiency.Finally, this thesis proposes a query execution optimization method based on increment update in database distributed middleware. By storing the historical record and the historical result set, combine with the incremental updating of data tables. For the same query on the same table, only calculates the incremental result set. Through merge the historical data set and incremental data set in each data node to get the final result. In order to reduce the system coupling, in this thesis, we use the local database node to manage query logs and historical results sets, use distributed database middleware to control query decomposition and result generation. The thesis also uses a mathematical model to verity the validity of the method. The thesis uses the incremental update strategy to query on the TPC-H data, which verities the effectiveness of this method and the improvement of the performance of the distributed database middleware.
Keywords/Search Tags:distributed database middleware, distributed data management, query optimization, data partition, incremental query
PDF Full Text Request
Related items