Design And Implementation Of Real-time Reporting System Based On Big Data

Posted on:2017-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhou

Full Text:PDF

GTID:2308330485460545

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the report query scene of the smart restaurant system, business users have great demand for the summary of business data. In reporting systems can easily meet the needs of business users. And when querying from the existing data and generating report data, there are a lot of tables from the database for random query scene, and mostly contains table join query. Traditional processing mode (direct access to the database) in the amount of data (<10 million), the query response time can be optimized to a few seconds or tens of seconds, but when the amount of data that arrives a few millions, hundreds of millions or even more than one billion records, optimizing this process mode or changing the indexing mechanism both can not meet the requirements of concurrent queries in seconds, and there is a large pressure of the database in this case. The current processing mode of my internship company is off-line calculated mode, is about to import the data into the data warehouse (hive), do the off-line calculation to get the results set, and then the query from the results set, the disadvantage of off-line calculated mode is that it does not support ad hoc queries.The paper describes the scene of another processing mode, to solve the above problems by introducing a distributed index layer, the mode is applied to many large data ad hoc queries. In the data synchronization module to improve query performance by merging many relational databases (MySQL) table into a wide table, and use a search engine (Solr) rapid query features. It can achieve within 2 seconds to return a result and the query was successful in all when the amount of data reach 50 million and the amount of concurrent requests reach 20 to query a wide table. Such data query speed and the real-time results of this processing mode which are the traditional processing mode (direct access to the database) and off-line calculated mode can not be completed.The paper elaborated on the design and implementation of the whole data synchronization module, the incremental data synchronization module, the reports business module. Only the whole data synchronization module and incremental data synchronization module work together can it be sure that the data in Solr is accurate and real-time, and the users can get the real-time reports data when querying data through the reports business module. The whole data synchronization module, multi-threaded through smart scheduling, greatly enhance the speed of data synchronization. The development of the real-time data synchronization module is based on the Alibaba MySQL data synchronization component and messaging middleware, this module ensures that the incremental data can be synchronized to the distributed index in real-time.The author work independently and finished child table importing sub-module, Hive binding sub-module, Hive wide table merging sub-module, index file generating sub-module of the whole data synchronization module, incremental message provider sub-module, incremental message consumer sub-module of the incremental data synchronization module, member report sub-module of the reports business module.

Keywords/Search Tags:

Huge Data, Big Data, Real-time Report, Ad Hoc Query, Distributed Index

PDF Full Text Request

Related items

1	Research On Efficient Distributed Storage And Query Algorithm For Real-time Data Stream
2	The Design And Realization Of Real-time Logistics Data Query System Based On Strom
3	Research On Techniques And Systems For Index And Query Optimization Of Big Data
4	Study On Key Techniques Of Real-time Data Management In Wireless Sensor Networks
5	Design And Implementation Of Real-Time Data Processing System For Tickets
6	Research On Distributed Trajectory Data Index And Query Technology
7	Key Technologics Of Real-time Query Engine For Astronomical Data
8	Application Of Real-time Data Fast Algorithm For Oil And Gas Production Index
9	Index Techenologies For Real-Time Databases
10	HBase-based Storage And Query System For Traffic Checkpoints Data