|
Digital Library of the
European Council for Modelling and Simulation |
Title: |
The
Median Resource Failure Checkpointing |
Authors: |
Suleman Khan, Khizar Hayat, Sajjad A. Madani, Samee U. Khan, Joanna Kolodziej |
Published in: |
(2012).ECMS
2012 Proceedings edited by: K. G. Troitzsch, M. Moehring, U. Lotzmann. European
Council for Modeling and Simulation. doi:10.7148/2012 ISBN:
978-0-9564944-4-3 26th
European Conference on Modelling and Simulation, Shaping reality through simulation Koblenz,
Germany, May 29 – June 1 2012 |
Citation
format: |
Khan, S., Hayat,
K., Madani, S. A., Khan, S. U., & Kolodziej, J. (2012). The Median Resource Failure Checkpointing. ECMS 2012 Proceedings edited by: K. G. Troitzsch, M. Moehring, U. Lotzmann (pp. 483-489).
European Council for Modeling and Simulation. doi:10.7148/2012-0483-0489 |
DOI: |
http://dx.doi.org/10.7148/2012-0483-0489 |
Abstract: |
In grid computing, the
realization of an enviable fault tolerance ability
is linked with the proper utilization of resources and scheduling of jobs.
The literature offers two solutions to these two challenging tasks, viz. check-
pointing and replication. A checkpointing strategy
is being proposed that uses the median of failure inter- vals
of the resources in deciding the checkpoint intervals for the given jobs. The
strategy shows improved sys- tem throughput, job losses and job execution
times while eliminating unnecessary checkpoints. |
Full
text: |