Hour Minimization Based Resource Allocation for Deadline Constrained Scientific Workflow Application in Cloud Computing

S.K.Jeya Brindha,(M.Tech.,) J.Angela Jennifa Sujana, M.E, (Ph.D) Dr.T.Revathi, M.E, Ph.D

PG Scholar Assistant Professor (Sr.Grade) Senior Professor & Head

Department of IT Department of IT Department of IT

Mepco Schlenk Engineering College Mepco Schlenk Engineering College Mepco Schlenk Engineering College

Sivakasi, Tamilnadu, India Sivakasi, Tamilnadu, India Sivakasi, Tamilnadu, India

[email protected] [email protected] [email protected]

Abstract – Nowadays, the demands for Workflow applications are blooming increasingly to address the resources and VM instance hour minimization issues. Some provoking constraints acting as a breakthrough has steered us towards investigating the secret’s of cloud dominance such as (1) to achieve end-to-end deadlines; (2) Minimized VM instance allotment to a particular application; (3) Minimizing the application’s makespan by establishing Application Execution schedule; (4) Determining a VM operation schedule. Initially, it is necessary to determine the lower and upper bounds for end-to-end deadliness. In response to that, we develop a heuristic Minimal Slack time and Minimal Distance (MSMD) algorithm for ensuring the first two conditions. After determining the application execution schedule and appropriate VM instances, it is necessary to minimize the instance hours by using our proposed Instance Hour Minimization (IHM) algorithm. The experimental results prove that the proposed MSMD algorithm is an outstanding one when compared to other algorithms like HEFT and MOHEFT. Furthermore, the comparisons were made in the presence and absence of the IHM algorithm for ensuring the minimized execution instance hours.

Keywords - Makespan Minimization, Resource Minimization, Instance Hour Minimization, Cloud, End-to-End Deadline, MSMD, Scheduling.

INTRODUCTION

Cloud computing exert a pull on the storage demand and processing power to handle various applications. In cloud, it allows users to access the unlimited resources based on “pay-as-you-go” business model. Now-a-days, deadline constrained applications are seeking an opportunity in cloud to make use of the resources. For example applications like online media streaming applications, interactive deadline constrained e-learning and online banking systems are deadline constrained applications[1]. However, when deploying such deadline constrained applications on the cloud it faces the technical issues as how to minimize the cost for deploying applications on cloud while guaranteeing the deadline requirement requested by the applications [2].

In public cloud the economic cost of renting resources is referred as an application’s execution cost. For an instance, most of the researchers focused on minimizing the cost by reducing the application’s makespan. The economic cost is considered as a QOS requirement which transformed to multi-objective optimization problem on the cloud [3]. In deadline constrained application, it is tricky to meet the application’s deadline prerequisite. Meantime it is not necessary to finish the application in advance.

In this paper, we address the issue of virtual machine instance and VM instance hour for deadline constrained applications in cloud computing. The proposed work deals with minimizing the makespan of resources and minimizing the VM instance hour. At first, the minimal slack time and minimal distance algorithm reduces the resource makespan for application schedule. Meanwhile, the instance hour minimization algorithm reduces the VM instance hour for application schedule. The application execution schedule in the VM instances need to guarantee the following conditions: 1) the application need to satisfy the deadline, 2) The minimum number of VM instance need to execute the application tasks, 3) The minimum number of VM instance need to minimize the application’s makespan, 4) The VM instance hour should be minimized by allocating the application’s tasks.

The rest of the paper is organized as follows: Section 2 discusses the related work. In Section 3, we introduce the terms and models. Section 4 presents the heuristic MSMD algorithm for minimizing the resource makespan and VM instance. Section 5 presents the IHM algorithm for minimizing the VM instance hour. In Section 6, we deal with experimental results and evaluation. Section 7 concludes our proposed work.

RELATED WORK

Scheduling algorithms are devised for various DAG-based applications. Many of the researchers perform scheduling to tackle the makespan minimization of an application by ordering the tasks [4]. In DAG-based application the dependencies of tasks are achieved by the topological order. The min-min algorithm performs the schedule for independent tasks [5]. However, the min-min algorithm or Coffman-Graham (CG) algorithm can be directly applied to DAG-based applications.

Several algorithms adapt the prioritization basis for makespan minimization. The heterogeneous-earliest-finish-time (HEFT) algorithm [4] also uses the prioritization basis for minimizing the makespan which provide a list-based heuristic algorithm. The Duplication based bottom up scheduling (DBUS) algorithm is another heuristic algorithm which duplicates the tasks on each VM at the scheduling phase [2].

Meanwhile, when scheduling the application in cloud it needs to meet the deadline to minimize the cost [6].

In QoS-based workflow scheduling, it minimize the execution cost by meeting their deadlines when the application deployed on cloud [7]. The MOHEFT algorithm is used to optimize the makespan and cost of an application in a public cloud [8], [11]. The cost minimization approaches does not consider the charges based on instance hours or minutes. However, the minimization of instance hour of VM will minimize the cost of an application. The auto scaling scheduling algorithm [9] minimize the cost based on the instance hour.

During idle time, tasks are replicated in different VMs. The RIPCP (Replication based IC-PCP) algorithm improves the performance rather than fault tolerant [10].Similar to auto scaling algorithm, we proposed an IHM algorithm to reduce the VM instance hour with deadline constraint for DAG-based application.

MODEL AND PROBLEM FORMULATION

In this section, we deal with the system model, terms and formulation used in our work.

Application Model

A deadline constrained application A is modeled as a weighted directed-acyclic-graph (DAG) A(T,E), where each task T_i'∈'A and the weight of the task is represented as w(T_i). The edge (v_i,v_j )'∈'E represents the dependency between T_i and'〖' T'〗'_j. Every task need to start only after all its predecessors complete. Each application has a release time T_R and end-to-end deadlineT_D. The relative end-to-end deadline is the time interval from when an application is released to when the application must be finished. Accordingly, the application’s absolute end-to-end deadline is T_R+T_D.

System Model

A set of virtual machine (VM) instances is represented as C={C_1,…C_M}. In our proposed work, we assume that there is only one virtual instance type (i.e.) all VM runs with same unit speed. Each virtual machine instance executes only one task at a same time and the task executions are non-preemptive. Virtual machine instances are charged by unit time as hours. The set of time intervals of VM instance is represented as P_i={('〖'on'〗'_i^1,'〖'off'〗'_i^1 ),…('〖'on'〗'_i^n,'〖'off'〗'_i^n )} where '〖'on'〗'_i^j and '〖'off'〗'_i^j represent the power on and off times respectively. Given a VM instance’s operation pattern '〖' P'〗'_i , the instance hours needed is,

H(P_i )=∑_(j=1)^(|P_i |)'▒'⌈'('〖'off'〗'_i^j-'〖'on'〗'_i^j)/U'⌉' (1)

where U is a pricing time unit. Hence, if the instance hour is minimize for each VM instance then the total cost is minimized.

Preliminaries

In a given DAG-based application, the following the terms are defined in our work which is used to formulate the application schedule by minimizing the VM instance and instance hour.

Predecessors (pred(T_i ) ) and Successors(succ(T_i ) )

The predecessors and successors for a task is defined as:

(pred(T_i ) )={T_j│T_(j )'∈'A ˄ (T_j,T_i )'∈'E} (2)

(succ(T_i ) )={T_j│T_(j )'∈'A ˄ (T_i,T_j )'∈'E} (3)

Application Sequential Execution Time (T_seq)

The sequential execution time is defined as a sum of all tasks’ execution time.

Earliest Start Time (EST(T_i ) ) and Latest Finish Time (LFT(T_i ) )

The earliest start time and latest finish time of task is defined as:

(EST(T_i ) )={'█'(T_R if T_i=T_entry @(_T_k '∈' pred(T_i) ^max){EST(T_k )+w_k } otherwise )'┤

(4)

(LFT(T_i ) )={'█'(T_R+T_D if T_i=T_exit @(_T_k '∈' succ(T_i) ^min){LFT(T_k )-w_k } otherwise )'┤

(5)

Schedule Start Time (SST(T_i ) ) and Schedule Finish Time(SFT(T_i ) )

The schedule start time and schedule finish time denotes that the tasks’ execution time for a schedule on the VM instance which is different from earliest start time and latest finish time. In detail,SST(T_i ) ≥EST(T_i ) and SFT(T_i )=SST(T_i )+w_i.

Ready Time (ready(T_i ) )

The ready time of a task is defined as:

(ready(T_i ) )=(_T_k '∈' pred(T_i) ^max){SFT(T_k )} (6)

Maximal Slack Time (mslack(T_i ) )

The maximal slack time for a task is defined as:

(mslack(T_i ) )=LFT(T_i )-(EST(T_i )+w(T_i ) ) (7)

Topological Level (Lev(T_i ) )

The task topological level for a DAG-based application is defined as:

(Lev(T_i ) )={'█'(0 if T_i=T_entry @(_T_k '∈' pred(T_i) ^max){Lev(T_k ) }+1 otherwise )'┤

(8)

Critical Path P_c

The longest path execution time of DAG is defined as a critical path and the path is starts at entry task T_entry and ends at exit task T_exit.

Problem Formulation

Based on models and preliminaries, we address the problem formulation by the following objectives:

Objective 1: Minimize number of VM Instances

Objective 1: min M

Subject to: SFT(T_exit )≤ T_R+T_D (9)

and '∀'T_i'∈'A,∑_(j=1)^M'▒'〖'S(T_i,C_j )=1'〗' (10)

where S(T_i,C_j )=1 represents that only one task is allocate to VM instance.

Objective 2: Minimize VM Instances Makespan

Objective 2: min SFT(T_exit )

Subject to: '∀'T_i'∈'A,∑_(j=1)^M'▒'〖'S(T_i,C_j )=1'〗' (11)

Objective 3: Minimize Total VM Instance Hours

Objective 3: min ∑_(i=1)^M'▒'H_i (12)

HEURISTIC MSMD SCHEDULING ALGORITHM

In this section, the heuristic MSMD scheduling algorithm performs its scheduling by minimizing the number of VM instances and the makespan of VM instances. Initially, it searches for a schedule which satisfies the application’s deadline. The schedule searching process deals with two phases: 1) prioritization phase and 2) scheduling phase. The prioritization phase performs based on the topological levels and minimal slack time. The scheduling phase performs based on the minimal distance between resource available time and task ready time.

Prioritization Phase

In prioritization phase, priority is assigned for a task based on topological level. The tasks will obtain higher priority based on the lower topological level. If the tasks are at same level, then it is prioritized based on minimum slack time. Lastly, the tasks are sorted in decreasing order by its priority.

Scheduling Phase

In scheduling phase, the heuristic MSMD algorithm finds the minimum VM instances so that it reduces the application makespan. The distance is used to indicate how close a tasks’ ready time is to VM instance’s earliest available time. The distance is calculated as follows:

Dis(T_i,c_j )={'█'(av(c_j ) if ready(T_i )<av(c_j )@ready(T_i )-av(c_j ) otherwise)'┤

(13)

Algorithm 1: Schedule Searching

Input : Application: G(V,E) , T_R, T_D

Output: A schedules satisfies SFT(T_exit )≤ T_R+T_D

L ← prioritize(G); //ordered list

m ← '⌈'T_seq/T_D '⌉';

do

S[m] ← {ϕ}; //VM job queue

SFT(T_exit ) ← MSMD (G, m, L, S);

m ← m+1;

while SFT(T_exit )≤ T_R+T_D;

return S[m]

Algorithm 1 shows the pseudocode of schedule searching. Line 1 represents that the tasks are prioritized based on the topological level. The Lines 3-8 finds the minimum VM instances need to allocate tasks by satisfying the deadline constraint for a schedule.

Algorithm 2: MSMD (G, m, L, S)

T[m] ← {0} // av(c);

'〖' P'〗'_c ← G’s critical path;

for i ← 0 to |L| -1 do

if L[i] '∈' P_c or m=1 then

Assign L[i] to S[0]

end

else

T[0] ← max{av(0),T[0]};

minDisComp ← 0;

distance ← Dis(L[i],0);

for j←1 to m-1 do

T[j] = av(S[j]);

if Dis(L[i],j) < distance then

distance ← Dis(L[i],j);

minDisComp ← j;

end

end

Assign L[i] to minDisComp;

if minDisComp = 0 then

T[0] ← T[0] + w(L[i])

end

end

end

return SFT(T_exit )

Algorithm 2 depicts the pseudocode of minimal slack time and minimal distance. The Lines 3-4 assigns the tasks on the critical path to the same VM instance. From line 7-20 it finds the VM instance which has minimum distance from available time to current tasks’ ready time. The heuristic MSMD algorithm returns the schedule with minimum VM instance which need to satisfy the deadline-constrained for an application.

HOUR MINIMIZATION FOR RESOURCE ALLOCATION

In this section, the hour minimization for resource allocation deals with application scheduling algorithm to minimize the instance hour which also minimize the economic cost of resource utilization in cloud. In order to minimize the total VM instance hour, we need to find the operation pattern for each VM instance. During idle time if we need to shutdown the VM it should satisfy any one of the following constraint:

C1: For two consecutive tasks T_1 and T_2, assume that their execution times on a given virtual instance are e_1 and e_2, respectively, the idle time between tasks T_1 and T_2 is b, where 0 ≤ b ≤ U and U is the pricing time unit.

C2: For two consecutive tasks T_1 and T_2, assume that their execution times on a given virtual instance are e_1 and e_2, respectively, the idle time between tasks T_1 and T_2 is b, where U ≤ b ≤ 2U and U is the pricing time unit.

C3: For two consecutive tasks T_1 and T_2, assume that their execution times on a given virtual instance are e_1 and e_2, respectively, the idle time between tasks T_1 and T_2 is b, where b ≥ 2U and U is the pricing time unit.

Algorithm 3: Instance Hour Minimization Algorithm

Input : Set of schedule S={S_1,….,S_n}

Output: '〖' P'〗'_i for each schedule S_i

for i←1 to |S| do

if '⌈'(∑_(k=1)^(|S_i |)'▒'〖'w(S_i [k])'〗')/C'⌉'='⌈'(SFT(S_i [|S_(c_i ) | ] )-SST(S_i [1]))/C'⌉' then

on ← SST(S_i [1] ) ,off ← SFT(S_i [|S_i | ] ) ;

push (on, off) →'〖' P'〗'_i;

continue;

end

else

T_1 ←S_i [1] ;

for j ←2 to |S_i | do

T_2 ←S_i [j] , b = SST(T_1 ) – SFT(T_2) ;

Decide if need to shut down the VM

during idle time based on C1,C2 and C3;

if need to shut down then

push the operation pattern for

T_1 →P_i, T_1 ←S_i [j] , j ← j+1;

end

else

Combine T_1 and T_2 as a new T_1 ;

j ← j+1;

end

end

end

end

In algorithm 3, we discuss the pseudocode of instance hour minimization algorithm. In Lines 2-5, the VM instance’s operation pattern is turn on at first tasks and turn off at last tasks. Lines 8-19 focus on a constrain checking like, if there is a idle time between tasks, it applies the shutdown strategy for VM based on the constraints C1, C2 and C3 which minimize the VM instance hour.

EXPERIMENTAL RESULTS

To evaluate the proposed method, we used cloudsim toolkit for simulation. The MSMD algorithm and IHM algorithm are evaluated by comparing different algorithms such as HEFT, MOHEFT. For evaluation, we used various DAG-based applications like Montage, Cybershake, and Epigenomics (GENOME). The performance metrics used for evaluating the proposed system are Resource reduction rate, Makespan reduction rate and Instance Hour increase rate.

In MSMD algorithm, to estimate the minimum number of VM instances we used the performance metric resource reduction rate which defined as:

Resource Reduction Rate=(Upper Bound-Actual Resource used)/(Upper Bound)

(14)

The upper bound for a resource is defined as:

Upper Bound=|T|-Lev(T_exit ) (15)

where |T| is the number of tasks in the application.

To estimate the minimum makespan of an application we used the performance metric makespan reduction rate which defines as:

Makespan Reduction Rate=(T_R+T_D-SFT(T_exit ))/T_D (16)

In IHM algorithm, to estimate the minimum instance hour for an application we used the performance metric instance hour increase rate which defined as:

Instance Hour Increase Rate=(IH(A)-'⌈'T_seq⁄U'⌉')/'⌈'T_seq⁄U'⌉' (17)

Fig.1. Overall Performance

Fig.1 shows the average resource reduction rate and the average makespan reduction rate by comparing the results with other algorithms. It specifies that the MSMD algorithm has the largest resource reduction rate and the MOHEFT algorithm has the largest makespan reduction rate. In general, the MSMD algorithm utilizes 89% of average resource reduction rate and 3.4% of average makespan reduction rate. Compared to other algorithms MSMD gives better performance for resource reduction.

Fig.2. Overall Instance Hour Including Comparison with Performing IHM

Fig.2 shows the average instance hour increase rate by comparing the results with other algorithms. It states that the MSMD algorithm has the least instance hour increase rate which utilizes 8% of instance hour. The IHM reduces the instance hours for an application which helps to reduce the execution time of a schedule.

CONCLUSION

In this paper, we introduced a heuristic algorithm MSMD and IHM to tackle the issue of deploying deadline constrained applications in cloud. As a result, 1) achieves the end-to-end deadline, 2) minimizes the number of VM instances, 3) minimizes the makespan of a VM instances and 4) minimizes the total VM instance hour. The experimental result indicates that the heuristic algorithm MSMD algorithm gives the guarantee for deadline constrained application with fewer resources compared to other algorithms. By applying

IHM algorithm for an application it reduces the instance hour required to execute the task in the application.

### References

TEMENOS. (2013). Temenos. [Online]. Available: www.temenos.com

D. Bozdag, U. Catalyurek, and F. Ozguner, “A task duplication based bottom-up scheduling algorithm for heterogeneous environments,” in Proc. IEEE 20th Int. Parallel Distrib. Process. Symp.,2006, pp. 12–pp.

S. Jayadivya and S. M. S. Bhanu, “Qos based scheduling of workflows in cloud computing,” Int. J. Comput. Sci. Electrical Eng., vol. 1, no. 1, pp. 15–21, 2012.

H. Topcuoglu, S. Hariri, and M.-Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 3,pp. 260–274, Mar. 2002.

M. Maheswaran, S. Ali, H. Siegal, D. Hensgen, and R. F. Freund, “Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems,” in Proc. IEEE Eighth Heterogeneous Comput. Workshop, 1999, pp. 30–44.

J. Yu, R. Buyya, and C. Tham, “Cost-based scheduling of scientific workflow applications on utility grids,” in Proc. IEEE First Int.Conf. e-Science and Grid Comput., 2005, pp. 8–pp.

L. M. Khanli and M. Analoui, “Qos-based scheduling of workflow applications on grids,” in Proc. Third Conf. IASTED Int. Conf.: Adv. Comput. Sci. Technol., ser. ACST’07, Anaheim, CA, USA, 2007, pp. 276–282.

J. J. Durillo, R. Prodan, and H. M. Fard, “Moheft: a multi-objective list-based method for workflow scheduling,” in Proc. IEEE 4th Int Conf. Cloud Comput. Technol. Sci., 2012, pp. 185–192.

M. Mao and M. Humphrey, “Auto-scaling to minimize cost and meet application deadlines in cloud workflows,” in Proc. IEEE Int. Conf. High Perform. Comput., Netw. Storage Anal., 2011, pp. 1–12.

J.Angela Jennifa Sujana, T.Revathi and M.Malarvizhili, “Scheduling of Scientific Workflows in Cloud with Replication” , Applied Mathematical Sciences, Vol. 9, no. 46, 2273 – 2280, 2015. HIKARI Ltd.

Ms. M. Geethanjali, Mrs.J.Angela Jennifa Sujana and Dr.T.Revathi, “Ensuring Truthfulness for Scheduling Multi-objective Real Time Tasks in Multi Cloud Environments”, in Proc. IEEE Int Conf. ICRTIT.,2014.

**...(download the rest of the essay above)**