Font Size: a A A

Research On CouchDB Storage Plugin For Apache Drill

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiaoFull Text:PDF
GTID:2428330623973468Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid increase in the size of big data,traditional relational database systems have been unable to meet the development needs.The emergence of non-relational databases has solved some of the problems faced by traditional relational databases.However,there are many types of non-relational databases and their storage methods are different.As a result,various types of big data real-time query platforms have appeared.At present,the mainstream big data real-time query platforms include Apache Drill,Apache Hive,SPARK,and Impala.Compared with other query platforms,Drill has the advantage of a scalable data source.By implementing a storage plugin,Drill can be connected to different non-relational databases,and it can also dynamically combine the data from multiple data sources in a single query.Drill also provides a unified,easy-to-learn,and widely used standard SQL as a query language.Through these advantages,Drill has gradually become one of the most popular open source big data real-time query platforms.Currently,in document-oriented non-relational databases,Drill only supports MongoDB as a data source.However,custom storage format of MongoDB is poorly readable,the application interface is not rich enough,and the availability is insufficient due to memory usage issues.As a result,Drill is limited in querying and processing document-oriented databases.CouchDB is an emerging document-oriented database.Compared with MongoDB,CouchDB has the advantages of using JSON format to store data,can run in more operating system environments,and can be used in any language that supports HTTP requests.However,the query in CouchDB is based on disk,which results in low query performance,and the query method of CouchDB is relatively complicated,and it cannot support standard SQL.Based on the above reasons,CouchDB also has certain shortcomings.In response to the above issues,this article does the following:(1)The system architecture of Drill and the working principle of its storage plugin,the architecture of the CouchDB database and its query methods were researched in depth.(2)Designed and implemented the CouchDB storage plugin for Drill,which enables CouchDB to perform SQL queries through the Drill platform,thereby further expanding the ability for Drill to manage document-oriented databases.At the same time,the column storage data structure Value Vectors of Drill is used to transfer disk-based queries in CouchDB to memory,which improves the query efficiency of CouchDB.(3)The performance comparison and analysis of MongoDB storage plugin and CouchDB storage plugin for Drill are performed.(4)The Calcite optimizer used by Drill is discussed,and corresponding optimization rules are designed according to the rich query characteristics of CouchDB.Through the optimization rules of column trimming and predicate push-down,the processing of invalid data is reduced,which results in smaller memory consumption and higher data transmission efficiency.The main contributions of this article are as follows:(1)Through in-depth research on Drill and CouchDB,this article designs and implements the CouchDB storage plug-in for Drill.The source code of the CouchDB storage plug-in project has been approved by Charles S.Givre--the PMC Chair of the Drill project and submitted.(2)This article extends the support of Drill for document databases,making the Drill more powerful in managing non-relational databases.(3)This article makes use of the characteristics of Drill to make up for the shortcomings of CouchDB database.The column storage data structure of Drill can improve the query performance of CouchDB,and through the optimization rules of column trimming and predicate push down,the filter in SQL are pushed down to CouchDB for rich queries,which can reduce memory consumption and network transmission costs.In addition,CouchDB can perform joint queries with other non-relational databases through Drill.
Keywords/Search Tags:Big data, Drill, CouchDB, Storage Plugin, Query Optimization
PDF Full Text Request
Related items