Font Size: a A A

Design And Implementation Of Real-time Reporting System Based On Big Data

Posted on:2017-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2308330485460545Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the report query scene of the smart restaurant system, business users have great demand for the summary of business data. In reporting systems can easily meet the needs of business users. And when querying from the existing data and generating report data, there are a lot of tables from the database for random query scene, and mostly contains table join query. Traditional processing mode (direct access to the database) in the amount of data (<10 million), the query response time can be optimized to a few seconds or tens of seconds, but when the amount of data that arrives a few millions, hundreds of millions or even more than one billion records, optimizing this process mode or changing the indexing mechanism both can not meet the requirements of concurrent queries in seconds, and there is a large pressure of the database in this case. The current processing mode of my internship company is off-line calculated mode, is about to import the data into the data warehouse (hive), do the off-line calculation to get the results set, and then the query from the results set, the disadvantage of off-line calculated mode is that it does not support ad hoc queries.The paper describes the scene of another processing mode, to solve the above problems by introducing a distributed index layer, the mode is applied to many large data ad hoc queries. In the data synchronization module to improve query performance by merging many relational databases (MySQL) table into a wide table, and use a search engine (Solr) rapid query features. It can achieve within 2 seconds to return a result and the query was successful in all when the amount of data reach 50 million and the amount of concurrent requests reach 20 to query a wide table. Such data query speed and the real-time results of this processing mode which are the traditional processing mode (direct access to the database) and off-line calculated mode can not be completed.The paper elaborated on the design and implementation of the whole data synchronization module, the incremental data synchronization module, the reports business module. Only the whole data synchronization module and incremental data synchronization module work together can it be sure that the data in Solr is accurate and real-time, and the users can get the real-time reports data when querying data through the reports business module. The whole data synchronization module, multi-threaded through smart scheduling, greatly enhance the speed of data synchronization. The development of the real-time data synchronization module is based on the Alibaba MySQL data synchronization component and messaging middleware, this module ensures that the incremental data can be synchronized to the distributed index in real-time.The author work independently and finished child table importing sub-module, Hive binding sub-module, Hive wide table merging sub-module, index file generating sub-module of the whole data synchronization module, incremental message provider sub-module, incremental message consumer sub-module of the incremental data synchronization module, member report sub-module of the reports business module.
Keywords/Search Tags:Huge Data, Big Data, Real-time Report, Ad Hoc Query, Distributed Index
PDF Full Text Request
Related items