Skip to main content

Table 1 Comparison of fault detection and tolerance techniques used in grids along with their advantages and disadvantages

From: Fault tolerance in computational grids: perspectives, challenges, and issues

System Fault detection technique Types of faults detected Fault tolerance technique Advantages Disadvantages
Buyya and Murshed (2002), Klutke et al. (2003)
Heartbeat monitor Host failure, Network failure Resubmit the failed job Generic failure detection Can not handle user defined exceptions
Buyya and Murshed (2002), Coulouris et al. (2001)
GRRP Task crash failure Retry Task crash failure detection through protocols Can not handle user defined exceptions
Alvisi and Marzullo (1998), Hussain et al. (2006)
Pinging Task failure Checkpoint recovery Application level fault tolerance Can not discern between task failure and network failure
Townend and Xu (2003)
Polling Host crash, Network crash Retry on same machine Provides security, management of jobs, and fault tolerance Retry on same machine, can not detect task crash failure
Buyya and Murshed (2002), de Lemos (2006)
Generic heartbeat mechanism Host crash, task crash, and network failure Retry on another available machine Load balancing, heartbeat mechanism, Retry on another machine Does not support diverse failure recovery mechanism
CoG Kits
Guimaraes and de Melo (2011)
N/A N/A N/A Security, Discovery of resources, and management of resources Failure detection is hard coded, Ignores fault tolerance