Service Management

Shell Best Practices

Core Rules

╔══════════════════════════════════════════════════════════════╗
║                    ESSENTIAL RULES                           ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  Rule #1: ALWAYS quote strings with special characters      ║
║  Rule #2: ALWAYS use $() instead of backticks               ║
║                                                              ║
║  Why? Because unquoted strings and backticks cause:         ║
║  • Production outages from unexpected shell expansion       ║
║  • Security vulnerabilities from command injection          ║
║  • Data loss from glob pattern matching                     ║
║  • Script failures from word splitting                      ║
║                                                              ║
║  This section will save you from these disasters.           ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

Rule #1: Quoting to Prevent Shell Expansion

The Problem

# ❌ DANGEROUS - Shell interprets special characters
curl https://example.com/api?foo=bar&baz=qux

# What actually happens:
# 1. ? matches any single character (glob)
# 2. & runs curl in background
# 3. Shell tries to run "baz=qux" as a command
# Result: Command not found, API call fails

The Solution

# ✅ SAFE - Shell treats as literal string
curl "https://example.com/api?foo=bar&baz=qux"

# All special characters preserved exactly as written

Special Characters That MUST Be Quoted

┌─────────────────────────────────────────────────────────────┐
│          SHELL SPECIAL CHARACTERS REFERENCE                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Character  │  Meaning                │  Example Problem   │
│  ──────────────────────────────────────────────────────────│
│  ?          │  Single char wildcard   │  file?.txt matches│
│  *          │  Multi char wildcard    │  *.log expands    │
│  &          │  Background process     │  cmd & runs bg    │
│  ;          │  Command separator      │  cmd1; cmd2       │
│  |          │  Pipe operator          │  cmd1 | cmd2      │
│  $          │  Variable expansion     │  $var expands     │
│  <          │  Input redirection      │  < file reads     │
│  >          │  Output redirection     │  > file writes    │
│  `          │  Command substitution   │  `cmd` runs cmd   │
│  '          │  Strong quote           │  Literal string   │
│  "          │  Weak quote            │  Allows $var      │
│  \          │  Escape character       │  Escapes next     │
│  ( )        │  Subshell               │  (cmd) runs sub   │
│  { }        │  Command grouping       │  {cmd1;cmd2}      │
│  [ ]        │  Test/character class   │  [a-z] matches    │
│  ~          │  Home directory         │  ~/file expands   │
│  !          │  History/negate         │  !cmd repeats     │
│  #          │  Comment                │  #comment         │
│  SPACE/TAB  │  Word splitting         │  Splits args      │
│  NEWLINE    │  Command terminator     │  Ends command     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quoting Rules: Single vs Double Quotes

# Single quotes: EVERYTHING is literal
name='John'

echo 'Hello $name'

echo 'Cost: $100'

echo 'File: *.txt'

# Double quotes: Variables expand, but most special chars literal
name="John"

echo "Hello $name"

echo "Cost: \$100"

echo "File: *.txt"

# No quotes: DANGEROUS - full shell expansion
name="John Doe"

echo Hello $name

echo Cost: $100

echo File: *.txt

Real-World Examples: Why Quoting Matters

Example 1: Railway Deployment Gone Wrong

# ❌ WRONG - Will fail mysteriously
DATABASE_URL=postgresql://user:pass@host:5432/db?sslmode=require

psql $DATABASE_URL -c "SELECT 1"

# What happens:
# 1. ? tries to match a single character
# 2. If file named 'a' exists, ?sslmode becomes 'asslmode'
# 3. Connection fails with cryptic error

# ✅ CORRECT - Always quote URLs
DATABASE_URL="postgresql://user:pass@host:5432/db?sslmode=require"

psql "$DATABASE_URL" -c "SELECT 1"

Example 2: File Path with Spaces

# ❌ WRONG - Will fail on paths with spaces
backup_dir=/var/backups/my backup files

cd $backup_dir

# ✅ CORRECT - Quote paths
backup_dir="/var/backups/my backup files"

cd "$backup_dir"

Example 3: API Calls

# ❌ WRONG - API call runs in background!
curl https://api.example.com/data?user=john&limit=10

# Shell interprets:
# curl https://api.example.com/data?user=john (in background)
# limit=10 (tries to set variable)

# ✅ CORRECT - Quote the URL
curl "https://api.example.com/data?user=john&limit=10"

Example 4: SQL Queries

# ❌ WRONG - Shell expands $1, $2 as variables
psql "$DATABASE_URL" -c "SELECT * FROM users WHERE id = $1 AND status = $2"

# ✅ CORRECT - Quote the query
psql "$DATABASE_URL" -c 'SELECT * FROM users WHERE id = $1 AND status = $2'

# Or use double quotes and escape
psql "$DATABASE_URL" -c "SELECT * FROM users WHERE id = \$1 AND status = \$2"

Rule #2: Command Substitution - $() vs Backticks

Why $() is Better

┌─────────────────────────────────────────────────────────────┐
│           COMMAND SUBSTITUTION COMPARISON                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Feature          │  Backticks `cmd`  │  $() Syntax       │
│  ──────────────────────────────────────────────────────────│
│  Readability      │  ❌ Hard to spot   │  ✅ Very clear    │
│  Nesting          │  ❌ Need escaping  │  ✅ Easy nesting  │
│  Modern           │  ❌ Deprecated     │  ✅ POSIX standard│
│  Syntax highlight │  ❌ Poor support   │  ✅ Good support  │
│  Error messages   │  ❌ Confusing      │  ✅ Clear         │
│  Recommended      │  ❌ No            │  ✅ Yes           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Basic Usage

# OLD STYLE - Backticks (DON'T USE)
current_user=`whoami`

current_dir=`pwd`

file_count=`ls | wc -l`

# NEW STYLE - $() (ALWAYS USE THIS)
current_user=$(whoami)

current_dir=$(pwd)

file_count=$(ls | wc -l)

Nested Command Substitution

# ❌ OLD STYLE - Confusing, needs escaping
files=`ls \`pwd\`/src`

home_files=`find \`echo $HOME\` -name "*.txt"`

# ✅ NEW STYLE - Clear and readable
files=$(ls "$(pwd)/src")

home_files=$(find "$(echo "$HOME")" -name "*.txt")

Real Production Examples

# Example 1: Timestamped backups
# ❌ OLD
backup_file="backup-`date +%Y%m%d-%H%M%S`.sql"

# ✅ NEW
backup_file="backup-$(date +%Y%m%d-%H%M%S).sql"

# Example 2: Get service URL from Railway
# ❌ OLD
prod_url=`railway variables --kv | grep RAILWAY_SERVICE | cut -d'=' -f2`

# ✅ NEW
prod_url=$(railway variables --kv | grep "RAILWAY_SERVICE" | cut -d'=' -f2)

# Example 3: Count database rows
# ❌ OLD
row_count=`psql "$DATABASE_URL" -t -c "SELECT COUNT(*) FROM projects"`

# ✅ NEW
row_count=$(psql "$DATABASE_URL" -t -c "SELECT COUNT(*) FROM projects")

# Example 4: Find process PID
# ❌ OLD
pid=`ps aux | grep nginx | grep -v grep | awk '{print $2}'`

# ✅ NEW
pid=$(ps aux | grep "nginx" | grep -v "grep" | awk '{print $2}')

Combining Quoting and Command Substitution

# ✅ PERFECT - Both rules applied
timestamp=$(date +%Y%m%d-%H%M%S)

backup_file="backup-${timestamp}.sql"

pg_dump "$DATABASE_URL" > "$backup_file" 2> "${backup_file}.errors"

# Why this is safe:
# 1. $() for command substitution ✓
# 2. Quotes around variables ✓
# 3. Quotes around file paths ✓
# 4. ${} for variable clarity ✓

# ✅ PERFECT - Complex real-world example
load-secrets production applications && \
  psql "$DATABASE_PUBLIC_URL" -c "$(cat /path/to/query.sql)" \
  > "results-$(date +%Y%m%d).txt" \
  2> "errors-$(date +%Y%m%d).log"

When NOT to Quote

# Don't quote when you WANT word splitting
files="file1.txt file2.txt file3.txt"

rm $files

# Don't quote when you WANT glob expansion
rm *.tmp

# Don't quote array elements
files=("file1.txt" "file2.txt" "file3.txt")

rm "${files[@]}"

# BUT if unsure: ALWAYS QUOTE. Safer to over-quote than under-quote!

Production-Safe Script Template

#!/bin/bash
# Production-grade script with all best practices

set -euo pipefail

IFS=$'\n\t'

# Constants (always quoted)
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

readonly LOG_FILE="/var/log/myscript-$(date +%Y%m%d).log"

readonly TIMESTAMP="$(date '+%Y-%m-%d %H:%M:%S')"

# Functions with proper quoting
log() {
  echo "[${TIMESTAMP}] $*" | tee -a "$LOG_FILE"
}

error() {
  echo "[${TIMESTAMP}] ERROR: $*" | tee -a "$LOG_FILE" >&2
}

# Command substitution with quoting
get_service_status() {
  local service="$1"
  local status

  status=$(systemctl is-active "$service" 2>&1) || {
    error "Failed to check status of: $service"
    return 1
  }

  echo "$status"
}

# Main logic
main() {
  local database_url="${DATABASE_URL:-}"

  if [ -z "$database_url" ]; then
    error "DATABASE_URL not set"
    return 1
  fi

  # Proper quoting in all commands
  local backup_file="backup-$(date +%Y%m%d-%H%M%S).sql"

  log "Starting backup to: $backup_file"

  if pg_dump "$database_url" > "$backup_file" 2> "${backup_file}.errors"; then
    log "Backup successful: $backup_file"
  else
    error "Backup failed - see ${backup_file}.errors"
    return 1
  fi
}

main "$@"

Stream Redirection Fundamentals

Understanding File Descriptors

┌─────────────────────────────────────────────────────┐
│              LINUX I/O STREAMS                      │
├─────────────────────────────────────────────────────┤
│                                                     │
│  File Descriptor 0: stdin  (standard input)         │
│  ┌──────────┐                                       │
│  │ Keyboard │──▶ FD 0 ──▶ [Process]                │
│  └──────────┘                                       │
│                                                     │
│  File Descriptor 1: stdout (standard output)        │
│                    ┌──────────┐                     │
│  [Process] ──▶ FD 1 ──▶ │ Terminal │                │
│                    └──────────┘                     │
│                                                     │
│  File Descriptor 2: stderr (standard error)         │
│                    ┌──────────┐                     │
│  [Process] ──▶ FD 2 ──▶ │ Terminal │                │
│                    └──────────┘                     │
│                                                     │
└─────────────────────────────────────────────────────┘

Redirection Operators (With Proper Quoting!)

# Redirect stdout to file (overwrite)
command > "file.txt"

command 1> "file.txt"

# Redirect stdout to file (append)
command >> "file.txt"

# Redirect stderr to file
command 2> "errors.txt"

# Redirect both stdout and stderr to file
command &> "output.txt"

command > "output.txt" 2>&1

# Redirect stderr to stdout (for piping)
command 2>&1 | grep "pattern"

# Redirect stdout to file, stderr to different file
command > "output.txt" 2> "errors.txt"

# Redirect stdout to file, stderr to stdout
command > "output.txt" 2>&1

# Send output to black hole
command > /dev/null

command 2> /dev/null

command &> /dev/null

# Tee: show output AND save to file
command | tee "output.txt"

command 2>&1 | tee "output.txt"

# Real production example with proper quoting
timestamp=$(date +%Y%m%d-%H%M%S)

pg_dump "$DATABASE_URL" \
  > "backup-${timestamp}.sql" \
  2> "backup-errors-${timestamp}.log"

Why Order Matters (Shell Processes Left to Right!)

# ❌ WRONG - stderr still goes to terminal
command 2>&1 > "file.txt"

# Why? Shell processes redirections left-to-right:
# 1. 2>&1 - "redirect stderr to wherever stdout currently goes" (terminal)
# 2. >"file.txt" - "now redirect stdout to file" (but stderr already set)
# Result: stdout goes to file, stderr goes to terminal

# ✅ CORRECT - both go to file
command > "file.txt" 2>&1

# Why? Correct order:
# 1. >"file.txt" - "redirect stdout to file"
# 2. 2>&1 - "redirect stderr to wherever stdout goes" (the file)
# Result: both stdout and stderr go to file

# ✅ MODERN/BETTER - same result, clearer intent
command &> "file.txt"

Production Examples with Quoting

# ✅ Railway deployment with full logging
timestamp=$(date +%Y%m%d-%H%M%S)

railway up \
  > "deploy-${timestamp}.log" \
  2> "deploy-errors-${timestamp}.log"

# ✅ Database backup with error capture
backup_file="backup-$(date +%Y%m%d-%H%M%S).sql"

pg_dump "$DATABASE_URL" \
  > "$backup_file" \
  2> "${backup_file}.errors"

# ✅ API call with response and error logging
api_response="response-$(date +%Y%m%d-%H%M%S).json"

curl "https://api.example.com/data?user=john&limit=10" \
  > "$api_response" \
  2> "${api_response}.errors"

Security & Incident Response

1. Initial Compromise Detection (With Safe Quoting)

# Find recently modified files (potential backdoors)
output_file="/tmp/recent-changes-$(date +%Y%m%d).txt"

find / -type f -mtime -7 -ls 2>/dev/null | \
  grep -v "/proc\|/sys\|/dev" > "$output_file"

# Find files modified in last 24 hours
find /etc /usr/bin /usr/sbin /var/www -type f -mtime -1 2>/dev/null

# Find SUID/SGID binaries (privilege escalation)
suid_file="/tmp/suid-sgid-binaries-$(date +%Y%m%d).txt"

find / -type f \( -perm -4000 -o -perm -2000 \) -ls 2>/dev/null | \
  tee "$suid_file"

# Compare against known good list
current_suid="/tmp/current-suid-$(date +%Y%m%d).txt"

known_good="/var/baseline/known-good-suid.txt"

find / -type f -perm -4000 2>/dev/null | sort > "$current_suid"

diff "$known_good" "$current_suid"

Real Scenario: Post-Breach Forensics (Production-Safe)

#!/bin/bash
# Incident response: Capture system state
# All variables properly quoted!

set -euo pipefail

readonly TIMESTAMP=$(date +%Y%m%d-%H%M%S)

readonly IR_DIR="/var/log/incident-${TIMESTAMP}"

mkdir -p "$IR_DIR"

log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${IR_DIR}/incident.log"
}

log "Starting incident response at $TIMESTAMP"

# Network connections (properly quoted filenames)
log "Capturing active network connections..."

netstat -tulpn > "${IR_DIR}/network-connections.txt" 2>&1

ss -tulpn >> "${IR_DIR}/network-connections.txt" 2>&1

# Running processes
log "Capturing running processes..."

ps auxwww > "${IR_DIR}/processes.txt" 2>&1

ps -eo pid,user,cmd,start_time --sort=-start_time > "${IR_DIR}/processes-timeline.txt" 2>&1

# Open files
log "Capturing open files..."

lsof > "${IR_DIR}/open-files.txt" 2>&1

# Users and authentication
log "Capturing user sessions..."

who > "${IR_DIR}/logged-users.txt" 2>&1

last -F > "${IR_DIR}/login-history.txt" 2>&1

lastb -F > "${IR_DIR}/failed-logins.txt" 2>&1

# Cron jobs (backdoor persistence)
log "Capturing scheduled tasks..."

{
  for user in $(cut -f1 -d: /etc/passwd); do
    echo "=== Crontab for $user ==="
    crontab -u "$user" -l 2>&1 || echo "No crontab for $user"
    echo ""
  done
} > "${IR_DIR}/crontabs.txt"

cat /etc/crontab /etc/cron.d/* >> "${IR_DIR}/system-crontabs.txt" 2>&1

# Recent files (properly quoted paths)
log "Finding recently modified files..."

recent_files="${IR_DIR}/recent-files.txt"

find / -type f -mtime -7 2>/dev/null | \
  grep -v "/proc\|/sys\|/dev" > "$recent_files"

# Suspicious locations
log "Checking suspicious locations..."

ls -laR /tmp /var/tmp /dev/shm > "${IR_DIR}/tmp-directories.txt" 2>&1

# Package verification (if available)
if command -v debsums &>/dev/null; then
  log "Verifying package integrity..."
  debsums -c > "${IR_DIR}/package-integrity.txt" 2>&1
fi

# Create archive (properly quoted)
log "Creating incident response archive..."

archive_file="${IR_DIR}.tar.gz"

tar czf "$archive_file" "$IR_DIR" 2>&1

log "Incident response complete!"

log "Data collected in: $IR_DIR"

log "Archive created: $archive_file"

2. Live Network Monitoring (Quoted Commands)

# Monitor network connections in real-time
watch -n 2 'netstat -tulpn 2>&1 | grep "ESTABLISHED"'

# Track new connections (properly quoted log file)
log_file="/tmp/connection-monitoring-$(date +%Y%m%d).log"

while true; do
  netstat -tulpn 2>&1 | grep "ESTABLISHED" | \
    awk '{print $5,$7}' | sort | uniq
  sleep 5
done | tee "$log_file"

# Find listening services not in expected ports
unexpected_file="/tmp/unexpected-listeners-$(date +%Y%m%d).txt"

netstat -tulpn 2>/dev/null | \
  grep "LISTEN" | \
  grep -v ":22\|:80\|:443\|:5432\|:6379" | \
  tee "$unexpected_file"

# Real-time connection tracking with details
ss -tupn 2>&1 | grep -v "127.0.0.1" | \
  awk 'NR>1 {print $5,$6}' | \
  sort | uniq -c | sort -nr

3. Rootkit Detection (Production-Safe)

#!/bin/bash
# Rootkit detection with proper quoting

set -euo pipefail

readonly SCAN_DIR="/tmp/rootkit-scan-$(date +%Y%m%d-%H%M%S)"

mkdir -p "$SCAN_DIR"

# Check for hidden processes
ps aux | awk '{print $2}' | sort -n > "${SCAN_DIR}/ps-pids.txt"

ls -l /proc | awk 'NR>1 {print $9}' | grep '^[0-9]' | sort -n > "${SCAN_DIR}/proc-pids.txt"

diff "${SCAN_DIR}/ps-pids.txt" "${SCAN_DIR}/proc-pids.txt" > "${SCAN_DIR}/pid-differences.txt" || true

# Check for hidden files in common locations
find /tmp /var/tmp /dev/shm -name ".*" 2>&1 > "${SCAN_DIR}/hidden-files.txt"

# Verify system binaries (properly quoted paths)
binary_hashes="${SCAN_DIR}/binary-hashes.txt"

which -a ps ls netstat ss find | xargs md5sum > "$binary_hashes" 2>&1

# Check for LD_PRELOAD hijacking
echo "LD_PRELOAD: ${LD_PRELOAD:-<not set>}" > "${SCAN_DIR}/ld-preload.txt"

grep -r "LD_PRELOAD" /etc 2>/dev/null >> "${SCAN_DIR}/ld-preload.txt" || true

# List loaded kernel modules
lsmod | sort > "${SCAN_DIR}/kernel-modules.txt"

echo "Rootkit scan complete. Results in: $SCAN_DIR"

4. Log Analysis for Security (With Quoting)

# Failed SSH login attempts (properly quoted output file)
ssh_failures="/tmp/ssh-failures-$(date +%Y%m%d).txt"

grep "Failed password" /var/log/auth.log 2>/dev/null | \
  awk '{print $(NF-3)}' | sort | uniq -c | sort -nr | head -20 \
  > "$ssh_failures"

# Successful logins
successful_logins="/tmp/successful-logins-$(date +%Y%m%d).txt"

grep "Accepted" /var/log/auth.log 2>/dev/null | \
  awk '{print $1,$2,$3,$9,$11}' | tail -50 \
  > "$successful_logins"

# Privilege escalation attempts
priv_escalation="/tmp/privilege-escalation-$(date +%Y%m%d).txt"

grep -E "sudo|su\[" /var/log/auth.log 2>/dev/null | tail -100 \
  > "$priv_escalation"

# Web server attacks (if applicable)
web_attacks="/tmp/web-attacks-$(date +%Y%m%d).txt"

grep -E "\.\.\/|union.*select|script.*alert" /var/log/apache2/access.log 2>/dev/null | \
  awk '{print $1}' | sort | uniq -c | sort -nr \
  > "$web_attacks"

Network Diagnostics

1. Connection Troubleshooting (Cisco → Linux, Properly Quoted)

# Layer 1: Interface status
ip link show

interface="eth0"

ip -s link show "$interface"

# Layer 2: ARP table
ip neigh show

arp -n

# Layer 3: Routing table
ip route show

route -n

netstat -rn

# Layer 4: Active connections
ss -tan

netstat -tan

# DNS resolution (properly quoted domain)
domain="example.com"

dig "$domain" +short

nslookup "$domain"

host "$domain"

# Full path trace
target_host="8.8.8.8"

traceroute -n "$target_host"

mtr --report --report-cycles 10 "$target_host"

Cisco ASA Equivalent Commands (With Quoting):

# ASA: show interface
interface="eth0"

ip addr show "$interface"

ip -s link show "$interface"

# ASA: show route
ip route show table all

route -n

# ASA: show conn
ss -tupn

netstat -tupn

# ASA: show xlate (NAT translations)
conntrack -L 2>/dev/null

# ASA: show access-list hitcounts
iptables -L -v -n

nft list ruleset

# ASA: packet-tracer (capture specific host traffic)
target_host="192.168.1.10"

tcpdump -i any -nn host "$target_host"

2. Advanced Network Debugging (Production-Safe)

# Capture traffic on specific interface (properly quoted filenames)
interface="eth0"

port="443"

capture_file="/tmp/capture-$(date +%Y%m%d-%H%M%S).pcap"

traffic_log="/tmp/https-traffic-$(date +%Y%m%d).txt"

tcpdump -i "$interface" -nn -vv port "$port" 2>&1 | tee "$traffic_log"

# Capture and save for Wireshark analysis
tcpdump -i any -s0 -w "$capture_file" \
  "port 5432 or port 6379" 2>&1

# Check for packet loss (properly quoted log)
ping_log="/tmp/ping-test-$(date +%Y%m%d).txt"

target="8.8.8.8"

ping -c 100 "$target" 2>&1 | \
  tee "$ping_log" | \
  grep -E "transmitted|loss"

# Test port connectivity (properly quoted variables)
test_port() {
  local host="$1"
  local port="$2"

  if timeout 5 bash -c "cat < /dev/null > /dev/tcp/${host}/${port}" 2>&1; then
    echo "✅ ${host}:${port} is open"
    return 0
  else
    echo "❌ ${host}:${port} is closed or filtered"
    return 1
  fi
}

# Usage
test_port "192.168.1.1" "80"

test_port "database.example.com" "5432"

3. Firewall Analysis (With Proper Quoting)

# List all firewall rules (properly quoted output files)
timestamp=$(date +%Y%m%d-%H%M%S)

firewall_rules="/tmp/firewall-rules-${timestamp}.txt"

nat_rules="/tmp/nat-rules-${timestamp}.txt"

iptables -L -n -v --line-numbers > "$firewall_rules" 2>&1

iptables -t nat -L -n -v > "$nat_rules" 2>&1

# Count packets per rule
watch -n 5 'iptables -L -n -v | grep -v "^Chain\|^target"'

# Export firewall config (properly quoted backup file)
backup_file="/tmp/iptables-backup-$(date +%Y%m%d).rules"

iptables-save > "$backup_file" 2>&1

# For nftables (modern replacement)
nft_backup="/tmp/nftables-backup-$(date +%Y%m%d).conf"

nft list ruleset > "$nft_backup" 2>&1

Container & Cloud Operations

1. Docker Management (Production-Safe Quoting)

# List containers with status (properly quoted format)
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" 2>&1

# Find containers using specific port
port="5432"

docker ps --filter "publish=${port}" --format "{{.Names}}: {{.Ports}}" 2>&1

# Container resource usage (quoted log file)
stats_file="/tmp/docker-stats-$(date +%Y%m%d-%H%M%S).txt"

docker stats --no-stream 2>&1 | tee "$stats_file"

# Inspect container networking (properly quoted container name)
container_name="domus_postgres"

docker inspect "$container_name" | jq '.[0].NetworkSettings' 2>&1

# Real-time container logs with errors highlighted
docker logs -f "$container_name" 2>&1 | grep --color=always -E "ERROR|WARN|$"

# Export container logs (quoted filenames)
log_file="/tmp/${container_name}-logs-$(date +%Y%m%d).txt"

docker logs "$container_name" > "$log_file" 2>&1

# Container shell (troubleshooting)
docker exec -it "$container_name" /bin/bash 2>&1 || \
  docker exec -it "$container_name" /bin/sh 2>&1

# Copy files from container (properly quoted paths)
source_path="/path/to/file"

dest_path="/tmp/extracted-file-$(date +%Y%m%d)"

docker cp "${container_name}:${source_path}" "$dest_path" 2>&1

Production Docker Monitoring Script (All Quoted):

#!/bin/bash
# Monitor Docker containers with proper quoting

set -euo pipefail

readonly LOG_FILE="/var/log/docker-monitor.log"

readonly CHECK_INTERVAL=60

log() {
  local timestamp
  timestamp=$(date '+%Y-%m-%d %H:%M:%S')
  echo "[${timestamp}] $*" | tee -a "$LOG_FILE"
}

while true; do
  # Check for stopped containers (properly quoted)
  stopped_containers=$(docker ps -a --filter "status=exited" --format "{{.Names}}" 2>&1)

  if [ -n "$stopped_containers" ]; then
    log "ALERT: Stopped containers: $stopped_containers"
  fi

  # Check for high memory usage (properly parsed)
  docker stats --no-stream 2>&1 | \
    awk 'NR>1 {gsub("%","",$7); if ($7 > 80) print $2, $7"%"}' | \
    while read -r container mem; do
      log "WARNING: ${container} using ${mem} memory"
    done

  # Check for restarting containers
  restarting=$(docker ps --filter "status=restarting" --format "{{.Names}}" 2>&1)

  if [ -n "$restarting" ]; then
    log "CRITICAL: Restarting containers: $restarting"
  fi

  sleep "$CHECK_INTERVAL"
done

2. Railway Operations (Your Platform, Properly Quoted!)

#!/bin/bash
# Railway deployment with proper quoting

set -euo pipefail

readonly TIMESTAMP=$(date +%Y%m%d-%H%M%S)

readonly LOG_DIR="${HOME}/deployment-logs"

readonly DEPLOY_LOG="${LOG_DIR}/deploy-${TIMESTAMP}.log"

readonly ERROR_LOG="${LOG_DIR}/deploy-errors-${TIMESTAMP}.log"

mkdir -p "$LOG_DIR"

log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$DEPLOY_LOG"
}

error() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $*" | tee -a "$ERROR_LOG" >&2
}

# Get Railway logs (properly quoted)
log "Fetching Railway logs..."

railway logs --tail 100 2>&1 | tee "${LOG_DIR}/railway-logs-${TIMESTAMP}.txt"

# Check Railway status
log "Checking Railway status..."

railway status 2>&1 | tee -a "$DEPLOY_LOG"

# Get environment variables (properly quoted grep pattern)
log "Checking environment variables..."

railway variables --kv 2>&1 | \
  grep -E "REDIS|DATABASE|NODE_ENV" | \
  tee -a "$DEPLOY_LOG"

# Deploy with full logging (properly quoted redirection)
log "Starting deployment..."

if railway up > "${DEPLOY_LOG}.railway" 2> "${ERROR_LOG}.railway"; then
  log "Deployment successful!"
else
  error "Deployment failed - check ${ERROR_LOG}.railway"
  exit 1
fi

Database Administration

1. PostgreSQL Operations (All Properly Quoted!)

# Connection testing (properly quoted URL)
psql "$DATABASE_URL" -c "SELECT version();" 2>&1

# Database size (properly quoted output file)
db_size_report="/tmp/db-size-$(date +%Y%m%d).txt"

psql "$DATABASE_URL" -c "
  SELECT
    pg_database.datname,
    pg_size_pretty(pg_database_size(pg_database.datname)) AS size
  FROM pg_database
  ORDER BY pg_database_size(pg_database.datname) DESC;" 2>&1 \
  > "$db_size_report"

# Table sizes
table_size_report="/tmp/table-sizes-$(date +%Y%m%d).txt"

psql "$DATABASE_URL" -c "
  SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
  FROM pg_tables
  WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
  ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
  LIMIT 20;" 2>&1 \
  > "$table_size_report"

# Active connections
connections_report="/tmp/active-connections-$(date +%Y%m%d).txt"

psql "$DATABASE_URL" -c "
  SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query_start,
    state_change
  FROM pg_stat_activity
  WHERE state != 'idle'
  ORDER BY query_start;" 2>&1 \
  > "$connections_report"

# Long-running queries
long_queries="/tmp/long-queries-$(date +%Y%m%d).txt"

psql "$DATABASE_URL" -c "
  SELECT
    pid,
    now() - query_start AS duration,
    query,
    state
  FROM pg_stat_activity
  WHERE state != 'idle'
    AND query NOT ILIKE '%pg_stat_activity%'
  ORDER BY duration DESC
  LIMIT 10;" 2>&1 | tee "$long_queries"

# Backup database (CRITICAL: properly quoted filenames!)
backup_timestamp=$(date +%Y%m%d-%H%M%S)

backup_file="/tmp/backup-${backup_timestamp}.sql"

error_log="/tmp/backup-errors-${backup_timestamp}.log"

pg_dump "$DATABASE_URL" \
  --no-owner \
  --no-acl \
  > "$backup_file" \
  2> "$error_log"

# Verify backup (properly quoted file check)
if [ -s "$backup_file" ]; then
  echo "✅ Backup created successfully: $backup_file"
  ls -lh "$backup_file"
else
  echo "❌ Backup failed - check errors: $error_log"
  cat "$error_log"
  exit 1
fi

2. Redis Operations (Your Stack, Properly Quoted!)

# Connect and get info (properly quoted URL and output)
redis_info="/tmp/redis-info-$(date +%Y%m%d).txt"

redis-cli -u "$REDIS_URL" INFO 2>&1 | tee "$redis_info"

# Monitor commands in real-time (properly quoted log file)
monitor_log="/tmp/redis-monitor-$(date +%Y%m%d-%H%M%S).log"

redis-cli -u "$REDIS_URL" MONITOR 2>&1 | tee "$monitor_log"

# Check memory usage
redis-cli -u "$REDIS_URL" INFO MEMORY 2>&1

# List keys sample (CAREFUL in production! Properly quoted output)
keys_sample="/tmp/redis-keys-sample-$(date +%Y%m%d).txt"

redis-cli -u "$REDIS_URL" --scan --pattern "*" 2>&1 | \
  head -100 > "$keys_sample"

# Get key count
key_count=$(redis-cli -u "$REDIS_URL" DBSIZE 2>&1)

echo "Redis key count: $key_count"

# Check latency
redis-cli -u "$REDIS_URL" --latency 2>&1

# Test connectivity with timing
echo "Testing Redis connectivity..."

time redis-cli -u "$REDIS_URL" PING 2>&1

Complete Production Deployment Script

#!/bin/bash
# Production deployment - Railway/Domus Digitalis
# ALL BEST PRACTICES APPLIED!

set -euo pipefail

IFS=$'\n\t'

# ============================================================================
# CONSTANTS (All properly quoted!)
# ============================================================================

readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

readonly PROJECT_ROOT="/home/evanusmodestus/atelier/_projects/personal/domus-digitalis"

readonly TIMESTAMP=$(date +%Y%m%d-%H%M%S)

readonly LOG_DIR="${HOME}/deployment-logs"

readonly DEPLOY_LOG="${LOG_DIR}/deploy-${TIMESTAMP}.log"

readonly ERROR_LOG="${LOG_DIR}/deploy-errors-${TIMESTAMP}.log"

# ============================================================================
# LOGGING FUNCTIONS
# ============================================================================

log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$DEPLOY_LOG"
}

error() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $*" | tee -a "$ERROR_LOG" >&2
}

success() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] ✅ $*" | tee -a "$DEPLOY_LOG"
}

# ============================================================================
# CLEANUP
# ============================================================================

cleanup() {
  local exit_code=$?

  if [ $exit_code -ne 0 ]; then
    error "Deployment failed with exit code: $exit_code"
    error "Check logs: $ERROR_LOG"
  else
    success "Deployment completed successfully"
  fi

  # Clean up temp files (properly quoted globs)
  rm -f "${PROJECT_ROOT}/apps/backend/"*.sql
}

trap cleanup EXIT

# ============================================================================
# PRE-FLIGHT CHECKS
# ============================================================================

preflight_checks() {
  log "Running pre-flight checks..."

  # Check we're in the right directory
  if [ ! -d "$PROJECT_ROOT" ]; then
    error "Project root not found: $PROJECT_ROOT"
    return 1
  fi

  cd "$PROJECT_ROOT" || {
    error "Cannot change to project root"
    return 1
  }

  # Verify git status
  if ! git status &>/dev/null; then
    error "Not a git repository"
    return 1
  fi

  # Check for uncommitted changes
  if ! git diff-index --quiet HEAD -- 2>/dev/null; then
    error "Uncommitted changes detected - commit first"
    return 1
  fi

  # Verify Railway CLI
  if ! command -v railway &>/dev/null; then
    error "Railway CLI not found - install first"
    return 1
  fi

  success "Pre-flight checks passed"
}

# ============================================================================
# BACKUP PRODUCTION DATABASE
# ============================================================================

backup_production() {
  log "Backing up production database..."

  local backup_dir="${HOME}/backups"
  mkdir -p "$backup_dir"

  local backup_file="${backup_dir}/prod-backup-${TIMESTAMP}.sql"
  local backup_errors="${backup_dir}/prod-backup-errors-${TIMESTAMP}.log"

  # Load production secrets and backup
  if load-secrets production applications && \
     pg_dump "$DATABASE_PUBLIC_URL" > "$backup_file" 2> "$backup_errors"; then

    if [ -s "$backup_file" ]; then
      success "Backup created: $backup_file ($(du -h "$backup_file" | cut -f1))"
    else
      error "Backup file is empty - check: $backup_errors"
      return 1
    fi
  else
    log "No existing production database to backup (or backup failed)"
    # Don't fail deployment if no DB exists yet
  fi
}

# ============================================================================
# PREPARE DATA EXPORT
# ============================================================================

prepare_data() {
  log "Preparing data export from local Docker..."

  # Verify Docker is running
  if ! docker ps &>/dev/null; then
    error "Docker is not running"
    return 1
  fi

  # Export from local database
  local sql_file="${PROJECT_ROOT}/apps/backend/production-seed.sql"
  local container="domus_postgres"

  if docker exec "$container" pg_dump -U domus_user domus_dev > "$sql_file" 2>&1; then
    success "Data exported: $sql_file ($(du -h "$sql_file" | cut -f1))"
  else
    error "Failed to export data from Docker"
    return 1
  fi

  # Verify SQL file
  if [ ! -s "$sql_file" ]; then
    error "SQL file is empty"
    return 1
  fi

  # Quick sanity check
  local insert_count
  insert_count=$(grep -c "INSERT INTO" "$sql_file" || echo "0")
  log "SQL file contains $insert_count INSERT statements"
}

# ============================================================================
# DEPLOY TO RAILWAY
# ============================================================================

deploy_railway() {
  log "Deploying to Railway production..."

  # Link to production
  railway link <<EOF
domusdigitalis-production
production
ExpressJS-production
EOF

  # Verify environment variables
  log "Verifying environment variables..."
  local redis_url
  redis_url=$(railway variables --kv 2>&1 | grep "^REDIS_URL=" | cut -d'=' -f2)

  if [[ ! "$redis_url" =~ REDIS_PUBLIC_URL ]]; then
    error "REDIS_URL is not set to \${{REDIS_PUBLIC_URL}}"
    error "Current value: $redis_url"
    return 1
  fi

  success "Environment variables verified"

  # Deploy
  log "Running railway up..."
  if railway up 2>&1 | tee -a "$DEPLOY_LOG"; then
    success "Railway deployment completed"
  else
    error "Railway deployment failed"
    return 1
  fi

  # Wait for deployment to stabilize
  log "Waiting 30 seconds for deployment to stabilize..."
  sleep 30
}

# ============================================================================
# SEED DATABASE
# ============================================================================

seed_database() {
  log "Seeding production database..."

  local sql_file="${PROJECT_ROOT}/apps/backend/production-seed.sql"

  if [ ! -f "$sql_file" ]; then
    error "SQL file not found: $sql_file"
    return 1
  fi

  # Import data (CRITICAL: chained with load-secrets!)
  if load-secrets production applications && \
     psql "$DATABASE_PUBLIC_URL" < "$sql_file" 2>&1 | tee -a "$DEPLOY_LOG"; then
    success "Database seeded successfully"
  else
    error "Database seeding failed"
    return 1
  fi

  # Verify data
  log "Verifying database contents..."
  local project_count
  project_count=$(load-secrets production applications && \
                  psql "$DATABASE_PUBLIC_URL" -t -c "SELECT COUNT(*) FROM projects" 2>&1)

  log "Projects in database: $project_count"

  if [ "$project_count" -lt 1 ]; then
    error "Database appears empty after seeding"
    return 1
  fi

  success "Database verification passed"
}

# ============================================================================
# VERIFY DEPLOYMENT
# ============================================================================

verify_deployment() {
  log "Verifying deployment..."

  # Get production URL
  local prod_url
  prod_url=$(railway status 2>&1 | grep "URL:" | awk '{print $2}')

  if [ -z "$prod_url" ]; then
    error "Could not determine production URL"
    return 1
  fi

  log "Production URL: $prod_url"

  # Health check
  log "Running health check..."
  local health_response
  health_response=$(curl -s "${prod_url}/api/health" 2>&1)

  if echo "$health_response" | jq -e '.status == "ok"' &>/dev/null; then
    success "Health check passed"
    echo "$health_response" | jq . | tee -a "$DEPLOY_LOG"
  else
    error "Health check failed"
    error "Response: $health_response"
    return 1
  fi

  # Test API endpoints
  log "Testing API endpoints..."
  local projects_count
  projects_count=$(curl -s "${prod_url}/api/projects?language=en" 2>&1 | jq '. | length' 2>&1)

  log "API returned $projects_count projects"

  success "All verification checks passed"
}

# ============================================================================
# CLEANUP TEMP FILES
# ============================================================================

cleanup_temp_files() {
  log "Cleaning up temporary files..."

  # Remove SQL file (security!)
  local sql_file="${PROJECT_ROOT}/apps/backend/production-seed.sql"

  if [ -f "$sql_file" ]; then
    rm -f "$sql_file"
    success "Removed temporary SQL file"
  fi

  # Verify it's gone
  if [ -f "$sql_file" ]; then
    error "Failed to remove SQL file: $sql_file"
    return 1
  fi
}

# ============================================================================
# MAIN
# ============================================================================

main() {
  mkdir -p "$LOG_DIR"

  log "=========================================="
  log "Production Deployment Started"
  log "Timestamp: $TIMESTAMP"
  log "=========================================="

  preflight_checks || exit 1
  backup_production || exit 1
  prepare_data || exit 1
  deploy_railway || exit 1
  seed_database || exit 1
  verify_deployment || exit 1
  cleanup_temp_files || exit 1

  log "=========================================="
  log "Deployment Complete!"
  log "Logs: $DEPLOY_LOG"
  log "=========================================="
}

main "$@"

Quick Reference Card

Shell Best Practices Cheat Sheet

# ✅ ALWAYS QUOTE
url="https://api.example.com/data?user=john&limit=10"

curl "$url"

# ✅ ALWAYS USE $()
timestamp=$(date +%Y%m%d-%H%M%S)

backup_file="backup-${timestamp}.sql"

# ✅ QUOTE VARIABLES
pg_dump "$DATABASE_URL" > "$backup_file" 2> "${backup_file}.errors"

# ✅ QUOTE FILE PATHS
config_file="/etc/myapp/config.conf"

cat "$config_file"

# ✅ ARRAY ELEMENTS
files=("file1.txt" "file2.txt" "file3.txt")

for file in "${files[@]}"; do
  echo "Processing: $file"
done

# ✅ COMMAND SUBSTITUTION
result=$(complex_command | grep "pattern" | awk '{print $2}')

# ✅ HERE DOCUMENTS (for multi-line strings)
cat > "config.yml" << 'EOF'
database:
  url: $DATABASE_URL
  pool: 10
EOF

Stream Redirection Cheat Sheet

# Redirect stdout
command > "file.txt"

# Redirect stderr
command 2> "errors.txt"

# Redirect both (modern)
command &> "output.txt"

# Redirect both (traditional)
command > "output.txt" 2>&1

# Pipe both streams
command 2>&1 | grep "pattern"

# Show and save
command 2>&1 | tee "output.txt"

# Silence completely
command &> /dev/null