Design And Implementation Of Massive Data Storage And Quasi - Real - Time Query System

Posted on:2016-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:F F Qi

Full Text:PDF

GTID:2208330470952894

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Massive Data storage and real-time query is becoming a hot spot on currentre search. Co mpanies are generating massive quantities of data, at speeds that demand a new approach to st oring and analyzing datasets. Traditional database management systems can be cumbersom e to scale, and painfully slow to get data into,requiring many upfront schema decision s and preparations.In this paper, the use of the Kafka message queue, Storm flow processing framework an d HBase such as database improves the Impala big data search engines, designs and impl ements a system that can satisfy the mass data’high throughput reliable storage and near-real-t ime query’demand. System with high concurrency, high robustness, dynamic scaling and faul t tolerance features, easy to be used and it supports high concurrency storage, the improved I mpala search engine can be directly based on HBase snapshot query, will no longer affect th e performance of HBase, can be used for complex query statistics class requirements. First, P rotocol Buffer is used for data serialization and pushed into the Kafka message queue, Storm f low processing system get the message from Kafka queue, KafkaSpout launch message flow a s a data source to Storm components. The FilterBolt implemented in this paper will filter out u nsafe data, HBaseBolt would eventually storage data in a distributed file system, if storage fai lure due to abnormal, it will pull the message processing again, until success, to improve the h igh fault tolerance of the system. Impala cluster and HBase cluster share the same distribute d file system, after the data is stored in a distributed file system, The System based on distribu ted file system frame sensing principle will distribut the data to the Impala and HBase cluste r at the same time. The improved Impala engine supports can directly query base snapshots, i t guarantees the Impala and base does not affect the performance of each other, that improve t he practicability of the system.Finally,this paper build the experimental system for completing the performance te sting,and monitoring the Concurrent performance and High Availability of the error tole ranee.Making performance comparison with the Hive and HBase System.Through the d ata of the experimental,this paper construct the new system is able to support good performance storage and scalability from the result.

Keywords/Search Tags:

Kafka message queue Storm, flow processing framework, HBase distributeddatabase, Impala big data search engines, Distributed file system frame sensing principle

PDF Full Text Request

Related items

1	The Design And Implementation Of Real-time Processing System For Device Log Stream Data Based On Storm
2	The Design And Implementation Of The Traffic Data Management Platform Based On HBase
3	Research And Implementation Of Data Processing Framework Of IoT Based On Storm
4	Research On Reliability Of Kafka Messaging System
5	The Research And Implementation Of Performance Modeling And Optimization Technology Of A Distributed Message System Named Kafka
6	Design And Implementation Of Customized Distributed Web Crawler
7	Design And Implementation Of Real-time Log Stream Processing System Based On Kafka And Storm
8	Design And Implementation Of Real-time Traffic Information Management System Based On Storm
9	Design And Implementation Of Multi-source Sensing And Emergency Linkage System For Smart City
10	Big Data Flow Processing Analtsis System Based On Kafka