Skip to main content

Table 1 Comparison of fault detection and tolerance techniques used in grids along with their advantages and disadvantages

From: Fault tolerance in computational grids: perspectives, challenges, and issues

System Fault detection technique Types of faults detected Fault tolerance technique Advantages Disadvantages
Globus
Buyya and Murshed (2002), Klutke et al. (2003)
Heartbeat monitor Host failure, Network failure Resubmit the failed job Generic failure detection Can not handle user defined exceptions
MDS-2
Buyya and Murshed (2002), Coulouris et al. (2001)
GRRP Task crash failure Retry Task crash failure detection through protocols Can not handle user defined exceptions
Legion
Alvisi and Marzullo (1998), Hussain et al. (2006)
Pinging Task failure Checkpoint recovery Application level fault tolerance Can not discern between task failure and network failure
Condor-G
Townend and Xu (2003)
Polling Host crash, Network crash Retry on same machine Provides security, management of jobs, and fault tolerance Retry on same machine, can not detect task crash failure
NetSolve
Buyya and Murshed (2002), de Lemos (2006)
Generic heartbeat mechanism Host crash, task crash, and network failure Retry on another available machine Load balancing, heartbeat mechanism, Retry on another machine Does not support diverse failure recovery mechanism
CoG Kits
Guimaraes and de Melo (2011)
N/A N/A N/A Security, Discovery of resources, and management of resources Failure detection is hard coded, Ignores fault tolerance