Application Research Of Distributed Query And Optimization Method Based On Metadata

Posted on:2015-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Ceng

Full Text:PDF

GTID:2268330425482058

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With complicate of data and business, query data which meet the conditions also will be more and more complex.When you want to query message from distributed data, programmers need to know all kinds of information about data such as data storage location, storage and storage structure, programmers need to call a lot of interface to obtain the relevant data. This process takes much energy on programming and itrequires a programmer has high familiarity with the data interface. If providing uniform data programming interface to programmers is impossible, it will shielding the backend access details, and then greatly improve the programming efficiency of programmers.A method of distributed query based on metadata, which uses metadata to define and manage the virtual table contained key information of the data source has been studied in this paper. Then, in view of the different data level, designed two different data solutions on query and optimized, it applying to common data and big data. In common data query, using the virtual table, the syntax analysis tree and memory database to realize common data query; by copying, moving, and divided the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, using Pig, Hadoop, Python to implement data query; By optimizing the Pig code,using multiple processes processing file merging and file upload or download in HDFS, making index on high frequency business and so on to achieve optimization of big data.Use metadata information to build a virtual table that can implementes a unified query of distributed data sources. Use LEMON grammar parser to parse and check SQL statement on virture table which users submited. In terms of common data query, using the syntax tree to semantic optimization; Using memory database to merge multiple source results. In terms of big data query, using Pig generate script and submit tasks; using Hadoop for distributed computing and query; Through multiple processes processing HDFS small file merging and file upload or download to reduce the load of the NameNode node, improve the speed of uploads and downloads; making index on high frequency Business, can find the data quickly and decrease the message program loaded. Those solutions have realized the data query optimization; they also achieved the goal of optimization.Research methods in this article blocked the complex details of distributed data query, provided a unified, simple SQL query interface to user. It makes the combination of distributed data query more convenient, and effectively improves the efficiency of the federated query execution.

Keywords/Search Tags:

Distributed, federated query, memory database, Hadoop, syntaxtree

PDF Full Text Request

Related items

1	Research And Application Of Query Technology For Federated Database System
2	Design And Implementation Of Acceleration Method For Massive Distributed In-Memory Database Query Engine
3	Optimizing Query Processing In Distributed In-Memory Databases
4	Design And Implementation Of Query Optimization Module For Distributed Column Database Based On Memory
5	Research On Query And Retrieval Techniques On Distributed Knowledge Graph
6	A Query Optimizer For The Column-Oriented Distributed In-Memory Database System
7	Massive Distributed In-memory Columnar Database Query Engine For On-line Analytical Processing
8	Instance-level integration, query processing and optimization in federated database systems
9	Query System Over Distributed Memory Cloud
10	RDMA-based Distributed Memory Database Query Engin