Font Size: a A A

The Design And Implementation Of Real-Time Query System For Mass Data Based On Hbase

Posted on:2014-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhuoFull Text:PDF
GTID:2248330398970830Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Massive data storage and query is one of the hot research of the industry in recent years. Because of the limitation of scalability and performance, the traditional relational database can not meet their requirements. Emerging of NoSQL database has good scalability, but does not support the traditional database features, such as SQL and secondary indexes. Therefore NewSQL which contains NoSQL high scalability and traditional database features is more suitable for mass data storage and query.Based on the database of HBase NoSQL, this paper design and implement a NewSQL system which can meet the needs of mass data real-time query. It maintains the availability, scalability, fault tolerance and other characteristics of the original system. It also supports the SQL language, easy to use, support secondary index function, supports the real-time data query. First, the input of the user is analyzed by the SQL command parser. Then through the schema converter the input translate into HBase column family and qualifier, then the process of the different types of SQL statements will be planed using the query planner. Finally finish the planning processes. This paper use JSQLParser as SQL command parser, and complete the schema transformation through field normalized format, and using the MapReduce framework implement the migration and backup of the database. In order to improve query efficiency, this paper use the existing Coprocssor components to complete the collection function query function and attributes conditions delete function, and based on the Coprocessor framework developed attribute update components and component for real-time generating the index. This paper also provides the index generated components based on the MapReduce framework which used to generate index offline in order to ensure the consistency of data.Finally, this paper builds the experimental system to complete the performance testing, and make performance comparison with the Hive+HBase system. Through the experimental data, this paper construct that the new system is able to support the SQL statement and secondary index function, and also provide good performance and scalability from the experimental data.
Keywords/Search Tags:HBase, SQL Parser, Secondary Index
PDF Full Text Request
Related items