The Design And Implementation Of A Real-time Query System For Massive Data Based On Spark

Posted on:2018-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:X J Liu

Full Text:PDF

GTID:2348330518995289

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and distributed system, we have entered the era of "big data". Today’s "big data" has the characteristics of large amount and fast propagation velocity, the value of data will fall sharply as the time goes by. These characteristics bring great challenges to the processing of big data. Data batch processing based on Hadoop MapReduce can handle the large amounts of data. However, the interval of processing is usually hourly which makes it unable to process data in real time and can no longer meet the requirements of real-time data query.Aiming at the situation of high real-time requirement in big data processing. A distributed real-time data processing system based on Spark and HBase is designed and implemented in this paper, which has achieve real-time data transformation and query conversion and improved the usability. The main works of this paper includes:1. Optimize the HDFS file storage policy: Consider the workload of the DataNodes when the file is distributed. Reduce the hot spots and unnecessary file movement, so as to increase the parallelism of data computation and improve the real-time performance.2. Implement a general, configurable real-time data conversion program: Set the source data format, the source field conversion rules and filtering rules to define the logic of the task, and avoid repeated development.3. Provide secondary index for HBase: Build index using MapReduce.Use HBase coprocessor to intercept CRUD operations on HTable, in order to ensure the correctness of data in secondary index.4. Add SQL query interface for HBase: Parsing the SQL statement,implementes the scheme conversion of relational table and HTable,converts the logic of SQL statement and HBase operation.

Keywords/Search Tags:

real-time, query, Spark, HBase

PDF Full Text Request

Related items

1	The Design And Implementation Of Real-Time Query System For Mass Data Based On Hbase
2	Design And Implementation Of The HBase Based Flight Real-time Tracking System
3	HBase-based Storage And Query System For Traffic Checkpoints Data
4	Research And Application Of Query Optimization Based On HBase
5	Design And Implementation Of Spark-based RDF Streaming Data Real-time Query System
6	Design And Implementation Of Real-time Recommendation System Based On Spark
7	Research And Implementation Of Key Technologies For Real-time Logistics Big Data Analysis
8	Design And Implementation Of Real-Time Distributing System Of Subway Advertisement Based On Spark
9	Research On Real-Time Query Processing In Cloud Computing For Terms In Data Streams
10	Design And Implementation Of Massive Log Data Quasi-Real-Time Query System Based On Hadoop