Rate Limiting and Backoff

Every API enforces rate limits. Exceed them and you get 429 Too Many Requests — or worse, a temporary ban. The goal is never to hit the limit in the first place.

Reading rate limit headers

Most APIs communicate limits through response headers. The names vary, but the pattern is consistent.

# Inspect rate limit headers (GitHub example)
curl -sI https://api.github.com/users/octocat \
  -H "Authorization: Bearer $GITHUB_TOKEN" \
  | grep -iE 'x-ratelimit|retry-after'

Common headers:

Header Meaning

X-RateLimit-Limit

Maximum requests allowed in the window

X-RateLimit-Remaining

Requests left before the limit resets

X-RateLimit-Reset

Unix timestamp when the window resets

Retry-After

Seconds to wait before retrying (on 429 responses)

X-Rate-Limit-Reset

Alternative spelling (Cloudflare, others)

RateLimit-Policy

IETF draft standard — describes the policy itself

Parsing the reset timestamp

X-RateLimit-Reset is a Unix epoch. Convert it to seconds remaining:

reset=$(curl -sI "$API_URL" | grep -oP 'X-RateLimit-Reset: \K\d+')
now=$(date +%s)
wait_seconds=$(( reset - now ))
echo "Rate limit resets in ${wait_seconds}s"

Exponential backoff

When you receive a 429 or 5xx, wait before retrying. Double the wait each time to avoid hammering a struggling server.

#!/usr/bin/env bash
# Exponential backoff with configurable ceiling

api_call_with_backoff() {
  local url="$1"
  local max_retries="${2:-5}"
  local base_delay="${3:-1}"
  local max_delay="${4:-60}"
  local attempt=0

  while (( attempt < max_retries )); do
    response=$(curl -sw '\n%{http_code}' "$url")
    http_code=$(echo "$response" | tail -1)
    body=$(echo "$response" | sed '$d')

    case "$http_code" in
      2[0-9][0-9])
        echo "$body"
        return 0
        ;;
      429|5[0-9][0-9])
        attempt=$((attempt + 1))
        delay=$(( base_delay * (2 ** (attempt - 1)) ))
        (( delay > max_delay )) && delay=$max_delay
        echo "Attempt ${attempt}/${max_retries}: HTTP ${http_code}, retrying in ${delay}s" >&2
        sleep "$delay"
        ;;
      *)
        echo "HTTP ${http_code}: not retryable" >&2
        echo "$body" >&2
        return 1
        ;;
    esac
  done

  echo "Exhausted ${max_retries} retries" >&2
  return 1
}

The ceiling (max_delay) prevents absurd waits. Without it, retry 10 would sleep for 512 seconds.

Pre-emptive throttling

Reacting to 429s means you already annoyed the server. Better: check remaining quota before each call and sleep proactively.

#!/usr/bin/env bash
# Pre-emptive throttle: sleep if remaining calls are low

throttled_request() {
  local url="$1"
  local threshold="${2:-10}"  # Start sleeping below this many remaining

  # Make the request, capture headers and body
  local tmpfile
  tmpfile=$(mktemp)
  local body
  body=$(curl -sD "$tmpfile" "$url")
  local remaining
  remaining=$(grep -oP 'X-RateLimit-Remaining: \K\d+' "$tmpfile" || echo "999")
  local reset
  reset=$(grep -oP 'X-RateLimit-Reset: \K\d+' "$tmpfile" || echo "0")
  rm -f "$tmpfile"

  echo "$body"

  if (( remaining < threshold )); then
    local now
    now=$(date +%s)
    local wait=$(( reset - now ))
    (( wait > 0 )) && {
      echo "Rate limit low (${remaining} remaining), sleeping ${wait}s" >&2
      sleep "$wait"
    }
  fi
}

Per-endpoint vs global rate limits

Some APIs enforce limits at multiple levels:

Scope Example

Global

5000 requests/hour across all endpoints (GitHub)

Per-endpoint

100 requests/minute to /search, 5000/hour elsewhere (GitHub Search)

Per-resource

30 writes/second per DNS zone (Cloudflare)

Per-user

Each API key has its own quota

Per-IP

Unauthenticated requests share a pool by source IP

When scripting against an API with per-endpoint limits, track counters per path, not just globally.

Common rate limits by provider

Provider Limit Window

GitHub (authenticated)

5,000 requests

1 hour

GitHub (unauthenticated)

60 requests

1 hour

Cloudflare

1,200 requests

5 minutes

Cisco ISE ERS

No published limit (but concurrent connection limits exist)

N/A

AWS (varies by service)

Typically 10-100 TPS

Per second

Slack Web API

1 request/second (Tier 1) to 100+/minute (Tier 4)

Varies by method tier

Meraki

10 requests/second

Per second (per org)

Google APIs

10 queries/second (default)

Per second

These are baseline defaults. Many providers offer higher limits for enterprise tiers or upon request. Always check the specific API documentation for current values.

Handling Retry-After

Some APIs return a Retry-After header on 429 responses. This is the most reliable signal — use it instead of computing your own delay.

retry_after=$(grep -oP 'Retry-After: \K\d+' "$headers_file" || echo "")
if [[ -n "$retry_after" ]]; then
  sleep "$retry_after"
else
  # Fall back to exponential backoff
  sleep "$(( base_delay * (2 ** attempt) ))"
fi

Retry-After may also be an HTTP-date (Wed, 09 Apr 2026 12:00:00 GMT). For robustness, parse both formats:

if [[ "$retry_after" =~ ^[0-9]+$ ]]; then
  sleep "$retry_after"
else
  target=$(date -d "$retry_after" +%s 2>/dev/null || echo "0")
  now=$(date +%s)
  (( target > now )) && sleep $(( target - now ))
fi