Font Size: a A A

Application Research Of Distributed Query And Optimization Method Based On Metadata

Posted on:2015-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y M CengFull Text:PDF
GTID:2268330425482058Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With complicate of data and business, query data which meet the conditions also will be more and more complex.When you want to query message from distributed data, programmers need to know all kinds of information about data such as data storage location, storage and storage structure, programmers need to call a lot of interface to obtain the relevant data. This process takes much energy on programming and itrequires a programmer has high familiarity with the data interface. If providing uniform data programming interface to programmers is impossible, it will shielding the backend access details, and then greatly improve the programming efficiency of programmers.A method of distributed query based on metadata, which uses metadata to define and manage the virtual table contained key information of the data source has been studied in this paper. Then, in view of the different data level, designed two different data solutions on query and optimized, it applying to common data and big data. In common data query, using the virtual table, the syntax analysis tree and memory database to realize common data query; by copying, moving, and divided the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, using Pig, Hadoop, Python to implement data query; By optimizing the Pig code,using multiple processes processing file merging and file upload or download in HDFS, making index on high frequency business and so on to achieve optimization of big data.Use metadata information to build a virtual table that can implementes a unified query of distributed data sources. Use LEMON grammar parser to parse and check SQL statement on virture table which users submited. In terms of common data query, using the syntax tree to semantic optimization; Using memory database to merge multiple source results. In terms of big data query, using Pig generate script and submit tasks; using Hadoop for distributed computing and query; Through multiple processes processing HDFS small file merging and file upload or download to reduce the load of the NameNode node, improve the speed of uploads and downloads; making index on high frequency Business, can find the data quickly and decrease the message program loaded. Those solutions have realized the data query optimization; they also achieved the goal of optimization.Research methods in this article blocked the complex details of distributed data query, provided a unified, simple SQL query interface to user. It makes the combination of distributed data query more convenient, and effectively improves the efficiency of the federated query execution.
Keywords/Search Tags:Distributed, federated query, memory database, Hadoop, syntaxtree
PDF Full Text Request
Related items