| It is the Age of Big Data and big data has immeasurable value. Therefore, more and more companies applied distributed-data warehouse to deal with big data, mine and utilize valuable information. In the situation of utilizing distributed data warehouse, you need to run a variety of daily tasks in order to deal with big data processing, analyzing, and finishing statistics. Some tasks are permanent background real-time tasks such as monitoring, real-time flow computing. Some tasks are short but complicated timing task which requires dedicated threads for maintenance. Even though using the distributed architecture, the huge amount of data and too many tasks still lead to many problems, such as loss of data caused by server crash, cluster resources not rational used and so on. Thus, tasks scheduling is very essential and urgently needed.Firstly, this paper described a various problems of real-time task and timing task in a distributed data warehouse and analyzed operating requirements and execution flow of real-time task and timing task. Secondly, on the condition of analyzing, this paper refined these two types of task execution flow, imitated the thread in operating system to construct life cycle model which is suitable for real-time task and timing task and proposed solutions for the scheduling of these two types of task. Finally, this paper showed a task scheduling system based on real-time task and timing task scheduling and processing solutions. This scheduling system uses a lightweight Remoting Onhttp tool called Hessian to build connection between the front and the back-end while reducing the coupling degree greatly. Back-end uses Netty framework which uses NIO (non-blocking IO) because it makes network programming simple. This system uses NIO to build non-blocking communication between server-side and worker-side. Based on the type of task and different execution conditions, scheduling system in server-side sends tasks executing messages to worker clusters in right time and sends messages of feedback to the server-side after having received messages and completed executing tasks. After that, the scheduling system will decide what to do based on these messages which reflects worker-side situations.The program takes the server status, network status, client status, task status, resource allocation, load balancing, and many other conditions into account to schedule and process timing tasks and real-time tasks. This can reduce the pressure on the server, improve system responsiveness greatly, improve the efficiency of task execution, use cluster resources rationally. This paper not only improves the efficiency of the system tasks scheduling and processing for real-time tasks and timing tasks, but also provides some references to other scheduling research. |