Font Size: a A A

Design And Implementation Of Massive Data Storage And Quasi - Real - Time Query System

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:F F QiFull Text:PDF
GTID:2208330470952894Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Massive Data storage and real-time query is becoming a hot spot on currentre search. Co mpanies are generating massive quantities of data, at speeds that demand a new approach to st oring and analyzing datasets. Traditional database management systems can be cumbersom e to scale, and painfully slow to get data into,requiring many upfront schema decision s and preparations.In this paper, the use of the Kafka message queue, Storm flow processing framework an d HBase such as database improves the Impala big data search engines, designs and impl ements a system that can satisfy the mass data’high throughput reliable storage and near-real-t ime query’demand. System with high concurrency, high robustness, dynamic scaling and faul t tolerance features, easy to be used and it supports high concurrency storage, the improved I mpala search engine can be directly based on HBase snapshot query, will no longer affect th e performance of HBase, can be used for complex query statistics class requirements. First, P rotocol Buffer is used for data serialization and pushed into the Kafka message queue, Storm f low processing system get the message from Kafka queue, KafkaSpout launch message flow a s a data source to Storm components. The FilterBolt implemented in this paper will filter out u nsafe data, HBaseBolt would eventually storage data in a distributed file system, if storage fai lure due to abnormal, it will pull the message processing again, until success, to improve the h igh fault tolerance of the system. Impala cluster and HBase cluster share the same distribute d file system, after the data is stored in a distributed file system, The System based on distribu ted file system frame sensing principle will distribut the data to the Impala and HBase cluste r at the same time. The improved Impala engine supports can directly query base snapshots, i t guarantees the Impala and base does not affect the performance of each other, that improve t he practicability of the system.Finally,this paper build the experimental system for completing the performance te sting,and monitoring the Concurrent performance and High Availability of the error tole ranee.Making performance comparison with the Hive and HBase System.Through the d ata of the experimental,this paper construct the new system is able to support good performance storage and scalability from the result.
Keywords/Search Tags:Kafka message queue Storm, flow processing framework, HBase distributeddatabase, Impala big data search engines, Distributed file system frame sensing principle
PDF Full Text Request
Related items