Method And Implementation For Hive-Based Offline Data Processing

Posted on:2017-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Zhu

Full Text:PDF

GTID:2348330491464429

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The rapid growth of offline data and business volumes results in huge overheads, long waiting time web page query for the traditional database technology and the simple Hadoop-based distributed computing methods. User experiences are seriously affected.In this thesis, an off-line data processing method is proposed which is based on Hadoop and Hive. Java timing tasks are adopted to start jobs. Taking into account real-time requirements of different jobs, running times are distributed to time periods to balance the system performance. Each offline data processing pro-cedure is regarded as a job. Every job is divided into several tasks. Jobs are triggered by Java timing tasks according to related information such as identifies, start times, cycle intervals. Timing tasks start jobs in terms of the query result obtained each minute. Different types of jobs begin to execute. A multi-dimension com-puting method is developed for complex statistical reporting jobs. Task templates are extracted from similar executions of jobs to improve reusability.The proposed methods are applied to an API Open Platform. Results show that the method reduces space consumption of redundant offline data, improves the protection of consumer's rights by predicting user frauds. In addition, the methods reduce time costs of report queries greatly by splitting the report data into multi-dimension statistics. User experiences are improved by reducing waiting times of querying web pages.

Keywords/Search Tags:

Off-line Data Processing, Task Templates, Hive, Distributed Framework, Timing Task

PDF Full Text Request

Related items

1	Design And Implementation Of Distributed Timing Task System Based On Quartz
2	Design And Implementation Of A Distributed Timing Task Scheduling System Based On Loosely Coupled Architecture
3	Research On Task Distribution Algorithms In Mobile Edge Computing
4	Design And Implementation Of Highly Available Distributed Task Scheduling And Execution System
5	Research And Implementation Of Efficient Task Scheduling Technology In Distributed Computing System
6	Research On Task Scheduling Of Geo-distributed Big-data Processing Jobs
7	Design And Implementation Of Distributed Scheduling And Execution System For Video Website Operation Tasks
8	Research On Multitask Scheduling And Processlng Based On Clusters
9	Design And Implementation Of Distributed Timing Task Scheduling System Based On Quartz
10	Design And Implementation Of WEB Services Supporting Distributed Timing Task Scheduling