Retry Strategies

Retrying is not always the right move. A retry on the wrong error doubles the damage. This page covers when to retry, how to retry safely, and how to stop retrying when the situation is unrecoverable.

Which errors to retry

HTTP Code Retry? Reasoning

429

Yes

Rate limited — the server is telling you to slow down and try again

500

Yes (with caution)

Internal server error — may be transient

502

Yes

Bad gateway — upstream server may recover

503

Yes

Service unavailable — server is overloaded or restarting

504

Yes

Gateway timeout — upstream was too slow

400

No

Bad request — your payload is wrong; retrying sends the same bad data

401

No (refresh token first)

Unauthorized — retry only after re-authenticating

403

No

Forbidden — you lack permission; retrying will not grant it

404

No

Not found — the resource does not exist

409

Maybe

Conflict — depends on whether the conflict is transient (concurrent edit) or permanent (duplicate key)

Network timeout

Yes

The request may not have reached the server

Connection refused

Yes (limited)

Server may be starting up — but give up quickly

The rule: retry on server errors (5xx) and rate limits (429). Never retry on client errors (4xx) without fixing the request first.

Exponential backoff with jitter

Pure exponential backoff (1s, 2s, 4s, 8s…​) causes thundering herds — all clients retry at the same moment. Jitter randomizes the delay to spread retries over time.

#!/usr/bin/env bash
# Exponential backoff with full jitter

retry_with_jitter() {
  local url="$1"
  local method="${2:-GET}"
  local data="$3"
  local max_retries=5
  local base_delay=1
  local max_delay=30
  local attempt=0

  while (( attempt <= max_retries )); do
    if [[ "$method" == "GET" ]]; then
      response=$(curl -sw '\n%{http_code}' "$url")
    else
      response=$(curl -sw '\n%{http_code}' -X "$method" -d "$data" \
        -H "Content-Type: application/json" "$url")
    fi

    http_code=$(echo "$response" | tail -1)
    body=$(echo "$response" | sed '$d')

    case "$http_code" in
      2[0-9][0-9])
        echo "$body"
        return 0
        ;;
      429|5[0-9][0-9])
        attempt=$((attempt + 1))
        if (( attempt > max_retries )); then
          echo "Exhausted retries after ${max_retries} attempts" >&2
          return 1
        fi
        # Full jitter: random value between 0 and exponential ceiling
        ceiling=$(( base_delay * (2 ** (attempt - 1)) ))
        (( ceiling > max_delay )) && ceiling=$max_delay
        delay=$(( RANDOM % (ceiling + 1) ))
        echo "Retry ${attempt}/${max_retries}: HTTP ${http_code}, waiting ${delay}s" >&2
        sleep "$delay"
        ;;
      *)
        echo "$body"
        return 1
        ;;
    esac
  done
}

Three jitter strategies exist:

Strategy Formula

Full jitter

random(0, min(cap, base * 2^attempt)) — spreads retries broadly

Equal jitter

delay/2 + random(0, delay/2) — guarantees at least half the computed delay

Decorrelated jitter

min(cap, random(base, prev_delay * 3)) — each retry is independent of the attempt count

Full jitter produces the best spread in practice. AWS recommends it in their architecture blog.

Circuit breaker pattern

A circuit breaker stops calling a failing endpoint after repeated failures. It prevents wasting time and quota on an endpoint that is clearly down.

Three states:

Closed

Normal operation — requests pass through

Open

Too many failures — requests fail immediately without calling the API

Half-open

After a cooldown, allow one test request to check if the service recovered

#!/usr/bin/env bash
# Simple circuit breaker using a state file

CIRCUIT_FILE="/tmp/circuit-$(echo "$API_URL" | md5sum | cut -d' ' -f1)"
FAILURE_THRESHOLD=5
COOLDOWN_SECONDS=60

circuit_call() {
  local url="$1"

  # Check circuit state
  if [[ -f "$CIRCUIT_FILE" ]]; then
    local failures opened_at
    failures=$(awk 'NR==1' "$CIRCUIT_FILE")
    opened_at=$(awk 'NR==2' "$CIRCUIT_FILE")
    local now
    now=$(date +%s)

    if (( failures >= FAILURE_THRESHOLD )); then
      local elapsed=$(( now - opened_at ))
      if (( elapsed < COOLDOWN_SECONDS )); then
        echo "Circuit OPEN: ${elapsed}s/${COOLDOWN_SECONDS}s cooldown" >&2
        return 1
      fi
      # Half-open: try one request
      echo "Circuit HALF-OPEN: testing..." >&2
    fi
  fi

  # Make the request
  local http_code
  http_code=$(curl -so /dev/null -w '%{http_code}' "$url")

  case "$http_code" in
    2[0-9][0-9])
      # Success: reset circuit
      rm -f "$CIRCUIT_FILE"
      curl -s "$url"
      return 0
      ;;
    429|5[0-9][0-9])
      # Failure: increment counter
      local current_failures=0
      [[ -f "$CIRCUIT_FILE" ]] && current_failures=$(awk 'NR==1' "$CIRCUIT_FILE")
      current_failures=$((current_failures + 1))
      printf '%d\n%d\n' "$current_failures" "$(date +%s)" > "$CIRCUIT_FILE"
      echo "Circuit failure ${current_failures}/${FAILURE_THRESHOLD}: HTTP ${http_code}" >&2
      return 1
      ;;
    *)
      echo "HTTP ${http_code}: client error, not counted as circuit failure" >&2
      return 1
      ;;
  esac
}

For production automation, use a proper circuit breaker library. The shell implementation above demonstrates the concept for understanding and quick scripts.

Idempotency keys for safe retries

GET requests are inherently idempotent — repeating them has no side effect. POST and PUT requests are not. If you retry a POST that actually succeeded (but the response was lost to a network error), you create a duplicate.

Idempotency keys solve this. You generate a unique key for each logical operation and send it as a header. The server deduplicates.

# Generate an idempotency key
idempotency_key=$(uuidgen)

# Use it on a POST request
curl -s -X POST "$API_URL/payments" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $idempotency_key" \
  -d '{"amount": 1000, "currency": "USD"}'

# Safe to retry with the SAME key -- server returns cached response
curl -s -X POST "$API_URL/payments" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $idempotency_key" \
  -d '{"amount": 1000, "currency": "USD"}'

Providers that support idempotency keys:

  • Stripe: Idempotency-Key header

  • AWS: built into SDK retry logic

  • Shopify: Idempotency-Key header

  • Many payment and financial APIs

If the API does not support idempotency keys, use a read-before-write pattern:

# Check if the resource already exists before creating
existing=$(curl -s "${API_URL}/users?email=user@example.com" | jq '.data | length')
if (( existing == 0 )); then
  curl -s -X POST "$API_URL/users" -d '{"email": "user@example.com"}'
fi

This is not perfectly safe (race condition between check and create), but it prevents the most common duplicate scenario.

Maximum retry limits

Always set a ceiling. Without one, a persistent 500 error produces infinite retries.

Reasonable defaults:

Scenario Max retries Reasoning

Interactive CLI

3

User is waiting — fail fast with a clear message

Background automation

5-7

More tolerance, but still bounded

Critical write operation

3 (with idempotency key)

Fewer retries, but safe to retry

Batch processing

10 (with circuit breaker)

Let the circuit breaker handle sustained failures

After exhausting retries, log the failure with enough context to diagnose: HTTP code, response body, URL, timestamp, and attempt count.