Dynamic scheduling is a method in which the hardware determines which instructions to execute, as opposed to a statically scheduled machine, in which the compiler determines the order of execution. In essence, the processor is executing instructions out of order. Dynamic scheduling is akin to a data flow machine, in which instructions don’t execute based on the order in which they appear, but rather on the availability of the source operands. Of course, a real processor also has to take into account the limited amount of resources available. Thus instructions execute based on the availability of the source operands as well as the availability of the requested functional units.
Runtime data is the data obtained when a program is running (or being executable). That is, when you start a program running in a computer, it is runtime for that program. In some programming languages, certain reusable programs or “routines” are built and packaged as a “runtime library.” these routines can be linked to and used by any program when it is running.
What is scheduling?
Scheduling is a process that assigns tasks to resources for the execution at a particular time. The main objective of scheduling is to map the tasks onto the processors to minimize the make span and maximize the reliability. The task scheduling can be categorized as static and dynamic.
1.22 Static Scheduling
Static scheduling is kind of scheduling by which the execution time of tasks and data dependencies between the tasks is known in advance. This type of scheduling takes place during compile time. It is called as offline deterministic scheduling. The problem of static task scheduling is known to be NP â” complete i.e., task list is not updated with new ordering at runtime.
In dynamic type of scheduling, tasks are allocated to processors upon their arrival and scheduling decisions must be made at run time. Scheduling decisions are based on dynamic parameters that may change during run time. In dynamic scheduling tasks can be reallocated to other processors during the run time.
1.24 Advantages of Dynamic Scheduling over Static Scheduling
â¢ It is flexible
â¢ It is faster than static scheduling
1.3 HETEROGENEOUS COMPUTER SYSTEM
Heterogeneous computing consists of applications running on a platform that has more than one computational unit with different architectures, such as a multi-core CPU and a many-core GPU. Using language frameworks such as OpenCL and CUDA applications running on the CPU launch kernels that can run on either the CPU or the GPU. Generally these kernels perform better on the GPU as they are optimized for a GPUâs highly parallel architecture and GPUs typically provide higher peak throughput. Therefore, applications preferentially schedule kernels on GPUs, leading to device contention and limiting overall throughput. In some cases, a better scheduling decision runs some kernels on the CPU, and even though they take longer than they would if run on the GPU, they still finish faster than if they were to wait for the GPU to be free. Furthermore, by utilizing all available processors for computational work, the total throughput of the system is increased over a static schedule that runs each kernel on the fastest device.
Fig 1: Heterogeneous computing system Architecture
1.4 NEED FOR DYNAMIC SCHEDULING
The efficiency of the scheduling scheme, which selects the device between the CPU and the GPU to execute the application, is one of the most critical factors that determine the performance of heterogeneous computing systems. Therefore we should consider the efficient scheduling methods for heterogeneous computing systems. If the programmers determine the device between the CPU and the GPU to execute an application when they implement the code, the status of the computing system cannot be considered dynamically, as the decision is made statically at compile time, resulting in the non-optimal system performance. On the other hand, a dynamic scheduling scheme, which selects the device to execute the application between the CPU and the GPU at runtime, can consider the status of the computing system dynamically, resulting in the improved performance. The dynamic scheduling methods typically use the information such as device utilization, input data size and performance prediction to select the device between the CPU and the GPU.
1.5 CPU/GPU performance comparison
Figure 1 shows the evaluation results when four different applications are executed on the CPU and the GPU. The horizontal axis denotes the execution time in seconds. The vertical axis represents the executed applications (MM, IMAGE, MP3, and TPACF). MM (Matrix Multiplication) and TPACF (One of the parboil benchmark applications ) denote the applications requiring highly intensive computation. IMAGE (Image processing) represents the application requiring highly intensive computation and frequent I/O operations whereas MM and TPACF do not require frequent I/O operations. MP3 (MP3 decoding program) represents the application requiring identical execution time independent on the type of executed device. We used four different applications to compare the performance of the CPU and the GPU with various types of applications.More detailed description of simulated applications and experimental methods is provided in Sect. 4.1.
As shown in the graph, when MM and TPACF are assigned to the GPU, the execution time is reduced compared to the CPU because the GPU is more suitable for processing parallel operations due to its many-core architecture. MM and TPACF contain a large amount of computations, which can be processed in parallel. Therefore, the GPU completes theMMand TPACF much faster than the CPU. For IMAGE application, the GPU cannot provide big performance gain due to frequent I/O operations even though the image processing application is suitable for parallel operations. Contrary to the other applications, the GPU and the CPU show the same execution time when the MP3 decoding program is executed as the MP3 decoding program requires equal playback time.
Note that the performance gain from the GPU is dependent on the application type. The GPU can obtain large performance gain when the application requires a large amount of parallel operations, whereas it can obtain little or no performance gain when the application does not require parallel operations. The performance gain of the GPU is also reduced when frequent I/O operations are required. For this reason, the performance of the heterogeneous computing system can be more improved if the dynamic scheduling scheme can consider the application type.
Fig. 1 Execution time of CPU and GPU
Improving the performance of computing systems by increasing the throughput of the CPU (Central Processing Unit) is restricted by the limits such as transistor scaling and temperature constraints . For this reason, solutions to reduce the workload of the CPU while improving the performance of computing systems have been explored. One solution to improve the performance of computing systems, when the CPU performance is saturated, is utilizing the GPU (Graphics Processing Unit) which is a highly specialized processor designed for graphics processing. In recent computing systems, the GPU reduces the workload of the CPU by processing complicated graphics-related computations instead of the CPU . Moreover, recent GPUs can process general-purpose applications as well as graphics-related applications with the help of integrated development environments such as CUDA, OpenCL, Cg, HLSL, and OpenGL [3â”5]. Especially CUDA (Compute Unified Device Architecture) developed by NVIDIA has been widely used, since it introduced a new data-parallel programming API (Application Programming Interface) based on the C language known to the users . By using CUDA, programmers can easily utilize the GPU resources for reducing the workload of the CPU while improving the performance of computing systems [7, 8]. Therefore, heterogeneous computing systems using both the CPU and the GPU together can be a solution for improving the system performance while reducing the workload of the CPU . The efficiency of the scheduling scheme, which selects the device between the CPU and the GPU to execute the application, is one of the most critical factors that determine the performance of heterogeneous computing systems. For this reason, we should consider the efficient scheduling methods for heterogeneous computing systems. If the programmers determine the device between the CPU and the GPU to execute an application when they implement the code, the status of the computing system cannot be considered dynamically, as the decision is made statically at compile time, resulting in the non-optimal system performance. On the other hand, a dynamic scheduling scheme, which selects the device to execute the application between the CPU and the GPU at runtime, can consider the status of the computing system dynamically, resulting in the improved performance. Therefore, many researchers have focused on the dynamic scheduling schemes for heterogeneous computing systems. The dynamic scheduling methods typically use the information such as device utilization, input data size and performance prediction to select the device between the CPU and the GPU dynamically, and they can be implemented through the OS (Operating System) [10â”12].
In this work, we propose a new dynamic scheduling scheme to improve the performance of the heterogeneous computing system by using both execution history for the incoming application and the remaining execution time for the currently executed application.
1.7 Problem Statement
Scheduling computational work for heterogeneous computer systems is substantially different than scheduling for systems with homogenous processing cores. An application can have a drastically different running time when assigned to different device. GPU kernels are very fast hence they can run hundred times faster than CPU kernels. Also GPU processors do not have the capability to time- slice workloads i.e. kernels that are launched on a GPU run sequentially one at a time. The latest GPUs have limited ability for multiple kernels to run in parallel, but there must be careful coordination to ensure that all kernels and their data fit onto the card, and that they do not have any dependencies. Without the ability to time-slice, a kernel launched behind other kernels must wait until all other kernels finish completely before starting its own work. In a time-sliced environment, a scheduler would have the ability to interleave kernels, enabling kernels with a small amount of work to finish in a shorter time period than if they had to wait for longer kernels to finish before running.
Since heterogeneous systems involves the use of two processors GPU and CPU in a single computing system. We therefore proposed a scheduling technique where we can select a device to execute application between the CPU and GPU based on historic runtime information. An application can have a drastically different running time when assigned to different device. Because of these runtime differences, assigning an application to one of two devices necessitates knowing which device will allow the application to run faster.
1.8 Research Objective
The aim of this thesis is to provide methodologies, strategies and mechanisms that aggregates allocation and scheduling capabilities to tasks that must be executed by heterogeneous systems. By this means, the applications can be dynamically configured over the asymmetric architecture in order to use the most appropriate computational resources that can currently reduce the tasksâ execution time. This aim at determining an algorithm in which we mix the CPU and GPU bound jobs so that the total time taken is significantly diminished.
1.9 Organization of Thesis
This research work was done based on the need to provide the scheduling of both CPU and GPU on a single system in order to improve performance and throughput. The running times for the execution of different jobs was used for this work, the work consists of six chapters. Chapter one include introduction, we discuss on dynamic scheduling, heterogeneous systems the need for dynamic scheduling, CPU/GPU performance comparison, motivation and problem statement. Literature review in chapter two. Proposed Methodology in chapter three. Implementation, result and discussion are explained in chapter four. Conclusion in chapter five and references in chapter six.
...(download the rest of the essay above)