Skip to main content

Table 1 Comparison of fault detection and tolerance techniques used in grids along with their advantages and disadvantages

From: Fault tolerance in computational grids: perspectives, challenges, and issues

System

Fault detection technique

Types of faults detected

Fault tolerance technique

Advantages

Disadvantages

Globus

Buyya and Murshed (2002), Klutke et al. (2003)

Heartbeat monitor

Host failure, Network failure

Resubmit the failed job

Generic failure detection

Can not handle user defined exceptions

MDS-2

Buyya and Murshed (2002), Coulouris et al. (2001)

GRRP

Task crash failure

Retry

Task crash failure detection through protocols

Can not handle user defined exceptions

Legion

Alvisi and Marzullo (1998), Hussain et al. (2006)

Pinging

Task failure

Checkpoint recovery

Application level fault tolerance

Can not discern between task failure and network failure

Condor-G

Townend and Xu (2003)

Polling

Host crash, Network crash

Retry on same machine

Provides security, management of jobs, and fault tolerance

Retry on same machine, can not detect task crash failure

NetSolve

Buyya and Murshed (2002), de Lemos (2006)

Generic heartbeat mechanism

Host crash, task crash, and network failure

Retry on another available machine

Load balancing, heartbeat mechanism, Retry on another machine

Does not support diverse failure recovery mechanism

CoG Kits

Guimaraes and de Melo (2011)

N/A

N/A

N/A

Security, Discovery of resources, and management of resources

Failure detection is hard coded, Ignores fault tolerance