Service Management
Shell Best Practices
Core Rules
╔══════════════════════════════════════════════════════════════╗
║ ESSENTIAL RULES ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ Rule #1: ALWAYS quote strings with special characters ║
║ Rule #2: ALWAYS use $() instead of backticks ║
║ ║
║ Why? Because unquoted strings and backticks cause: ║
║ • Production outages from unexpected shell expansion ║
║ • Security vulnerabilities from command injection ║
║ • Data loss from glob pattern matching ║
║ • Script failures from word splitting ║
║ ║
║ This section will save you from these disasters. ║
║ ║
╚══════════════════════════════════════════════════════════════╝
Rule #1: Quoting to Prevent Shell Expansion
Special Characters That MUST Be Quoted
┌─────────────────────────────────────────────────────────────┐
│ SHELL SPECIAL CHARACTERS REFERENCE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Character │ Meaning │ Example Problem │
│ ──────────────────────────────────────────────────────────│
│ ? │ Single char wildcard │ file?.txt matches│
│ * │ Multi char wildcard │ *.log expands │
│ & │ Background process │ cmd & runs bg │
│ ; │ Command separator │ cmd1; cmd2 │
│ | │ Pipe operator │ cmd1 | cmd2 │
│ $ │ Variable expansion │ $var expands │
│ < │ Input redirection │ < file reads │
│ > │ Output redirection │ > file writes │
│ ` │ Command substitution │ `cmd` runs cmd │
│ ' │ Strong quote │ Literal string │
│ " │ Weak quote │ Allows $var │
│ \ │ Escape character │ Escapes next │
│ ( ) │ Subshell │ (cmd) runs sub │
│ { } │ Command grouping │ {cmd1;cmd2} │
│ [ ] │ Test/character class │ [a-z] matches │
│ ~ │ Home directory │ ~/file expands │
│ ! │ History/negate │ !cmd repeats │
│ # │ Comment │ #comment │
│ SPACE/TAB │ Word splitting │ Splits args │
│ NEWLINE │ Command terminator │ Ends command │
│ │
└─────────────────────────────────────────────────────────────┘
Quoting Rules: Single vs Double Quotes
# Single quotes: EVERYTHING is literal
name='John'
echo 'Hello $name'
echo 'Cost: $100'
echo 'File: *.txt'
# Double quotes: Variables expand, but most special chars literal
name="John"
echo "Hello $name"
echo "Cost: \$100"
echo "File: *.txt"
# No quotes: DANGEROUS - full shell expansion
name="John Doe"
echo Hello $name
echo Cost: $100
echo File: *.txt
Real-World Examples: Why Quoting Matters
Example 1: Railway Deployment Gone Wrong
# ❌ WRONG - Will fail mysteriously
DATABASE_URL=postgresql://user:pass@host:5432/db?sslmode=require
psql $DATABASE_URL -c "SELECT 1"
# What happens:
# 1. ? tries to match a single character
# 2. If file named 'a' exists, ?sslmode becomes 'asslmode'
# 3. Connection fails with cryptic error
# ✅ CORRECT - Always quote URLs
DATABASE_URL="postgresql://user:pass@host:5432/db?sslmode=require"
psql "$DATABASE_URL" -c "SELECT 1"
Example 2: File Path with Spaces
# ❌ WRONG - Will fail on paths with spaces
backup_dir=/var/backups/my backup files
cd $backup_dir
# ✅ CORRECT - Quote paths
backup_dir="/var/backups/my backup files"
cd "$backup_dir"
Example 3: API Calls
# ❌ WRONG - API call runs in background!
curl https://api.example.com/data?user=john&limit=10
# Shell interprets:
# curl https://api.example.com/data?user=john (in background)
# limit=10 (tries to set variable)
# ✅ CORRECT - Quote the URL
curl "https://api.example.com/data?user=john&limit=10"
Example 4: SQL Queries
# ❌ WRONG - Shell expands $1, $2 as variables
psql "$DATABASE_URL" -c "SELECT * FROM users WHERE id = $1 AND status = $2"
# ✅ CORRECT - Quote the query
psql "$DATABASE_URL" -c 'SELECT * FROM users WHERE id = $1 AND status = $2'
# Or use double quotes and escape
psql "$DATABASE_URL" -c "SELECT * FROM users WHERE id = \$1 AND status = \$2"
Rule #2: Command Substitution - $() vs Backticks
Why $() is Better
┌─────────────────────────────────────────────────────────────┐ │ COMMAND SUBSTITUTION COMPARISON │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Feature │ Backticks `cmd` │ $() Syntax │ │ ──────────────────────────────────────────────────────────│ │ Readability │ ❌ Hard to spot │ ✅ Very clear │ │ Nesting │ ❌ Need escaping │ ✅ Easy nesting │ │ Modern │ ❌ Deprecated │ ✅ POSIX standard│ │ Syntax highlight │ ❌ Poor support │ ✅ Good support │ │ Error messages │ ❌ Confusing │ ✅ Clear │ │ Recommended │ ❌ No │ ✅ Yes │ │ │ └─────────────────────────────────────────────────────────────┘
Basic Usage
# OLD STYLE - Backticks (DON'T USE)
current_user=`whoami`
current_dir=`pwd`
file_count=`ls | wc -l`
# NEW STYLE - $() (ALWAYS USE THIS)
current_user=$(whoami)
current_dir=$(pwd)
file_count=$(ls | wc -l)
Nested Command Substitution
# ❌ OLD STYLE - Confusing, needs escaping
files=`ls \`pwd\`/src`
home_files=`find \`echo $HOME\` -name "*.txt"`
# ✅ NEW STYLE - Clear and readable
files=$(ls "$(pwd)/src")
home_files=$(find "$(echo "$HOME")" -name "*.txt")
Real Production Examples
# Example 1: Timestamped backups
# ❌ OLD
backup_file="backup-`date +%Y%m%d-%H%M%S`.sql"
# ✅ NEW
backup_file="backup-$(date +%Y%m%d-%H%M%S).sql"
# Example 2: Get service URL from Railway
# ❌ OLD
prod_url=`railway variables --kv | grep RAILWAY_SERVICE | cut -d'=' -f2`
# ✅ NEW
prod_url=$(railway variables --kv | grep "RAILWAY_SERVICE" | cut -d'=' -f2)
# Example 3: Count database rows
# ❌ OLD
row_count=`psql "$DATABASE_URL" -t -c "SELECT COUNT(*) FROM projects"`
# ✅ NEW
row_count=$(psql "$DATABASE_URL" -t -c "SELECT COUNT(*) FROM projects")
# Example 4: Find process PID
# ❌ OLD
pid=`ps aux | grep nginx | grep -v grep | awk '{print $2}'`
# ✅ NEW
pid=$(ps aux | grep "nginx" | grep -v "grep" | awk '{print $2}')
Combining Quoting and Command Substitution
# ✅ PERFECT - Both rules applied
timestamp=$(date +%Y%m%d-%H%M%S)
backup_file="backup-${timestamp}.sql"
pg_dump "$DATABASE_URL" > "$backup_file" 2> "${backup_file}.errors"
# Why this is safe:
# 1. $() for command substitution ✓
# 2. Quotes around variables ✓
# 3. Quotes around file paths ✓
# 4. ${} for variable clarity ✓
# ✅ PERFECT - Complex real-world example
load-secrets production applications && \
psql "$DATABASE_PUBLIC_URL" -c "$(cat /path/to/query.sql)" \
> "results-$(date +%Y%m%d).txt" \
2> "errors-$(date +%Y%m%d).log"
When NOT to Quote
# Don't quote when you WANT word splitting
files="file1.txt file2.txt file3.txt"
rm $files
# Don't quote when you WANT glob expansion
rm *.tmp
# Don't quote array elements
files=("file1.txt" "file2.txt" "file3.txt")
rm "${files[@]}"
# BUT if unsure: ALWAYS QUOTE. Safer to over-quote than under-quote!
Production-Safe Script Template
#!/bin/bash
# Production-grade script with all best practices
set -euo pipefail
IFS=$'\n\t'
# Constants (always quoted)
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly LOG_FILE="/var/log/myscript-$(date +%Y%m%d).log"
readonly TIMESTAMP="$(date '+%Y-%m-%d %H:%M:%S')"
# Functions with proper quoting
log() {
echo "[${TIMESTAMP}] $*" | tee -a "$LOG_FILE"
}
error() {
echo "[${TIMESTAMP}] ERROR: $*" | tee -a "$LOG_FILE" >&2
}
# Command substitution with quoting
get_service_status() {
local service="$1"
local status
status=$(systemctl is-active "$service" 2>&1) || {
error "Failed to check status of: $service"
return 1
}
echo "$status"
}
# Main logic
main() {
local database_url="${DATABASE_URL:-}"
if [ -z "$database_url" ]; then
error "DATABASE_URL not set"
return 1
fi
# Proper quoting in all commands
local backup_file="backup-$(date +%Y%m%d-%H%M%S).sql"
log "Starting backup to: $backup_file"
if pg_dump "$database_url" > "$backup_file" 2> "${backup_file}.errors"; then
log "Backup successful: $backup_file"
else
error "Backup failed - see ${backup_file}.errors"
return 1
fi
}
main "$@"
Stream Redirection Fundamentals
Understanding File Descriptors
┌─────────────────────────────────────────────────────┐ │ LINUX I/O STREAMS │ ├─────────────────────────────────────────────────────┤ │ │ │ File Descriptor 0: stdin (standard input) │ │ ┌──────────┐ │ │ │ Keyboard │──▶ FD 0 ──▶ [Process] │ │ └──────────┘ │ │ │ │ File Descriptor 1: stdout (standard output) │ │ ┌──────────┐ │ │ [Process] ──▶ FD 1 ──▶ │ Terminal │ │ │ └──────────┘ │ │ │ │ File Descriptor 2: stderr (standard error) │ │ ┌──────────┐ │ │ [Process] ──▶ FD 2 ──▶ │ Terminal │ │ │ └──────────┘ │ │ │ └─────────────────────────────────────────────────────┘
Redirection Operators (With Proper Quoting!)
# Redirect stdout to file (overwrite)
command > "file.txt"
command 1> "file.txt"
# Redirect stdout to file (append)
command >> "file.txt"
# Redirect stderr to file
command 2> "errors.txt"
# Redirect both stdout and stderr to file
command &> "output.txt"
command > "output.txt" 2>&1
# Redirect stderr to stdout (for piping)
command 2>&1 | grep "pattern"
# Redirect stdout to file, stderr to different file
command > "output.txt" 2> "errors.txt"
# Redirect stdout to file, stderr to stdout
command > "output.txt" 2>&1
# Send output to black hole
command > /dev/null
command 2> /dev/null
command &> /dev/null
# Tee: show output AND save to file
command | tee "output.txt"
command 2>&1 | tee "output.txt"
# Real production example with proper quoting
timestamp=$(date +%Y%m%d-%H%M%S)
pg_dump "$DATABASE_URL" \
> "backup-${timestamp}.sql" \
2> "backup-errors-${timestamp}.log"
Why Order Matters (Shell Processes Left to Right!)
# ❌ WRONG - stderr still goes to terminal
command 2>&1 > "file.txt"
# Why? Shell processes redirections left-to-right:
# 1. 2>&1 - "redirect stderr to wherever stdout currently goes" (terminal)
# 2. >"file.txt" - "now redirect stdout to file" (but stderr already set)
# Result: stdout goes to file, stderr goes to terminal
# ✅ CORRECT - both go to file
command > "file.txt" 2>&1
# Why? Correct order:
# 1. >"file.txt" - "redirect stdout to file"
# 2. 2>&1 - "redirect stderr to wherever stdout goes" (the file)
# Result: both stdout and stderr go to file
# ✅ MODERN/BETTER - same result, clearer intent
command &> "file.txt"
Production Examples with Quoting
# ✅ Railway deployment with full logging
timestamp=$(date +%Y%m%d-%H%M%S)
railway up \
> "deploy-${timestamp}.log" \
2> "deploy-errors-${timestamp}.log"
# ✅ Database backup with error capture
backup_file="backup-$(date +%Y%m%d-%H%M%S).sql"
pg_dump "$DATABASE_URL" \
> "$backup_file" \
2> "${backup_file}.errors"
# ✅ API call with response and error logging
api_response="response-$(date +%Y%m%d-%H%M%S).json"
curl "https://api.example.com/data?user=john&limit=10" \
> "$api_response" \
2> "${api_response}.errors"
Security & Incident Response
1. Initial Compromise Detection (With Safe Quoting)
# Find recently modified files (potential backdoors)
output_file="/tmp/recent-changes-$(date +%Y%m%d).txt"
find / -type f -mtime -7 -ls 2>/dev/null | \
grep -v "/proc\|/sys\|/dev" > "$output_file"
# Find files modified in last 24 hours
find /etc /usr/bin /usr/sbin /var/www -type f -mtime -1 2>/dev/null
# Find SUID/SGID binaries (privilege escalation)
suid_file="/tmp/suid-sgid-binaries-$(date +%Y%m%d).txt"
find / -type f \( -perm -4000 -o -perm -2000 \) -ls 2>/dev/null | \
tee "$suid_file"
# Compare against known good list
current_suid="/tmp/current-suid-$(date +%Y%m%d).txt"
known_good="/var/baseline/known-good-suid.txt"
find / -type f -perm -4000 2>/dev/null | sort > "$current_suid"
diff "$known_good" "$current_suid"
Real Scenario: Post-Breach Forensics (Production-Safe)
#!/bin/bash
# Incident response: Capture system state
# All variables properly quoted!
set -euo pipefail
readonly TIMESTAMP=$(date +%Y%m%d-%H%M%S)
readonly IR_DIR="/var/log/incident-${TIMESTAMP}"
mkdir -p "$IR_DIR"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "${IR_DIR}/incident.log"
}
log "Starting incident response at $TIMESTAMP"
# Network connections (properly quoted filenames)
log "Capturing active network connections..."
netstat -tulpn > "${IR_DIR}/network-connections.txt" 2>&1
ss -tulpn >> "${IR_DIR}/network-connections.txt" 2>&1
# Running processes
log "Capturing running processes..."
ps auxwww > "${IR_DIR}/processes.txt" 2>&1
ps -eo pid,user,cmd,start_time --sort=-start_time > "${IR_DIR}/processes-timeline.txt" 2>&1
# Open files
log "Capturing open files..."
lsof > "${IR_DIR}/open-files.txt" 2>&1
# Users and authentication
log "Capturing user sessions..."
who > "${IR_DIR}/logged-users.txt" 2>&1
last -F > "${IR_DIR}/login-history.txt" 2>&1
lastb -F > "${IR_DIR}/failed-logins.txt" 2>&1
# Cron jobs (backdoor persistence)
log "Capturing scheduled tasks..."
{
for user in $(cut -f1 -d: /etc/passwd); do
echo "=== Crontab for $user ==="
crontab -u "$user" -l 2>&1 || echo "No crontab for $user"
echo ""
done
} > "${IR_DIR}/crontabs.txt"
cat /etc/crontab /etc/cron.d/* >> "${IR_DIR}/system-crontabs.txt" 2>&1
# Recent files (properly quoted paths)
log "Finding recently modified files..."
recent_files="${IR_DIR}/recent-files.txt"
find / -type f -mtime -7 2>/dev/null | \
grep -v "/proc\|/sys\|/dev" > "$recent_files"
# Suspicious locations
log "Checking suspicious locations..."
ls -laR /tmp /var/tmp /dev/shm > "${IR_DIR}/tmp-directories.txt" 2>&1
# Package verification (if available)
if command -v debsums &>/dev/null; then
log "Verifying package integrity..."
debsums -c > "${IR_DIR}/package-integrity.txt" 2>&1
fi
# Create archive (properly quoted)
log "Creating incident response archive..."
archive_file="${IR_DIR}.tar.gz"
tar czf "$archive_file" "$IR_DIR" 2>&1
log "Incident response complete!"
log "Data collected in: $IR_DIR"
log "Archive created: $archive_file"
2. Live Network Monitoring (Quoted Commands)
# Monitor network connections in real-time
watch -n 2 'netstat -tulpn 2>&1 | grep "ESTABLISHED"'
# Track new connections (properly quoted log file)
log_file="/tmp/connection-monitoring-$(date +%Y%m%d).log"
while true; do
netstat -tulpn 2>&1 | grep "ESTABLISHED" | \
awk '{print $5,$7}' | sort | uniq
sleep 5
done | tee "$log_file"
# Find listening services not in expected ports
unexpected_file="/tmp/unexpected-listeners-$(date +%Y%m%d).txt"
netstat -tulpn 2>/dev/null | \
grep "LISTEN" | \
grep -v ":22\|:80\|:443\|:5432\|:6379" | \
tee "$unexpected_file"
# Real-time connection tracking with details
ss -tupn 2>&1 | grep -v "127.0.0.1" | \
awk 'NR>1 {print $5,$6}' | \
sort | uniq -c | sort -nr
3. Rootkit Detection (Production-Safe)
#!/bin/bash
# Rootkit detection with proper quoting
set -euo pipefail
readonly SCAN_DIR="/tmp/rootkit-scan-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$SCAN_DIR"
# Check for hidden processes
ps aux | awk '{print $2}' | sort -n > "${SCAN_DIR}/ps-pids.txt"
ls -l /proc | awk 'NR>1 {print $9}' | grep '^[0-9]' | sort -n > "${SCAN_DIR}/proc-pids.txt"
diff "${SCAN_DIR}/ps-pids.txt" "${SCAN_DIR}/proc-pids.txt" > "${SCAN_DIR}/pid-differences.txt" || true
# Check for hidden files in common locations
find /tmp /var/tmp /dev/shm -name ".*" 2>&1 > "${SCAN_DIR}/hidden-files.txt"
# Verify system binaries (properly quoted paths)
binary_hashes="${SCAN_DIR}/binary-hashes.txt"
which -a ps ls netstat ss find | xargs md5sum > "$binary_hashes" 2>&1
# Check for LD_PRELOAD hijacking
echo "LD_PRELOAD: ${LD_PRELOAD:-<not set>}" > "${SCAN_DIR}/ld-preload.txt"
grep -r "LD_PRELOAD" /etc 2>/dev/null >> "${SCAN_DIR}/ld-preload.txt" || true
# List loaded kernel modules
lsmod | sort > "${SCAN_DIR}/kernel-modules.txt"
echo "Rootkit scan complete. Results in: $SCAN_DIR"
4. Log Analysis for Security (With Quoting)
# Failed SSH login attempts (properly quoted output file)
ssh_failures="/tmp/ssh-failures-$(date +%Y%m%d).txt"
grep "Failed password" /var/log/auth.log 2>/dev/null | \
awk '{print $(NF-3)}' | sort | uniq -c | sort -nr | head -20 \
> "$ssh_failures"
# Successful logins
successful_logins="/tmp/successful-logins-$(date +%Y%m%d).txt"
grep "Accepted" /var/log/auth.log 2>/dev/null | \
awk '{print $1,$2,$3,$9,$11}' | tail -50 \
> "$successful_logins"
# Privilege escalation attempts
priv_escalation="/tmp/privilege-escalation-$(date +%Y%m%d).txt"
grep -E "sudo|su\[" /var/log/auth.log 2>/dev/null | tail -100 \
> "$priv_escalation"
# Web server attacks (if applicable)
web_attacks="/tmp/web-attacks-$(date +%Y%m%d).txt"
grep -E "\.\.\/|union.*select|script.*alert" /var/log/apache2/access.log 2>/dev/null | \
awk '{print $1}' | sort | uniq -c | sort -nr \
> "$web_attacks"
Network Diagnostics
1. Connection Troubleshooting (Cisco → Linux, Properly Quoted)
# Layer 1: Interface status
ip link show
interface="eth0"
ip -s link show "$interface"
# Layer 2: ARP table
ip neigh show
arp -n
# Layer 3: Routing table
ip route show
route -n
netstat -rn
# Layer 4: Active connections
ss -tan
netstat -tan
# DNS resolution (properly quoted domain)
domain="example.com"
dig "$domain" +short
nslookup "$domain"
host "$domain"
# Full path trace
target_host="8.8.8.8"
traceroute -n "$target_host"
mtr --report --report-cycles 10 "$target_host"
Cisco ASA Equivalent Commands (With Quoting):
# ASA: show interface
interface="eth0"
ip addr show "$interface"
ip -s link show "$interface"
# ASA: show route
ip route show table all
route -n
# ASA: show conn
ss -tupn
netstat -tupn
# ASA: show xlate (NAT translations)
conntrack -L 2>/dev/null
# ASA: show access-list hitcounts
iptables -L -v -n
nft list ruleset
# ASA: packet-tracer (capture specific host traffic)
target_host="192.168.1.10"
tcpdump -i any -nn host "$target_host"
2. Advanced Network Debugging (Production-Safe)
# Capture traffic on specific interface (properly quoted filenames)
interface="eth0"
port="443"
capture_file="/tmp/capture-$(date +%Y%m%d-%H%M%S).pcap"
traffic_log="/tmp/https-traffic-$(date +%Y%m%d).txt"
tcpdump -i "$interface" -nn -vv port "$port" 2>&1 | tee "$traffic_log"
# Capture and save for Wireshark analysis
tcpdump -i any -s0 -w "$capture_file" \
"port 5432 or port 6379" 2>&1
# Check for packet loss (properly quoted log)
ping_log="/tmp/ping-test-$(date +%Y%m%d).txt"
target="8.8.8.8"
ping -c 100 "$target" 2>&1 | \
tee "$ping_log" | \
grep -E "transmitted|loss"
# Test port connectivity (properly quoted variables)
test_port() {
local host="$1"
local port="$2"
if timeout 5 bash -c "cat < /dev/null > /dev/tcp/${host}/${port}" 2>&1; then
echo "✅ ${host}:${port} is open"
return 0
else
echo "❌ ${host}:${port} is closed or filtered"
return 1
fi
}
# Usage
test_port "192.168.1.1" "80"
test_port "database.example.com" "5432"
3. Firewall Analysis (With Proper Quoting)
# List all firewall rules (properly quoted output files)
timestamp=$(date +%Y%m%d-%H%M%S)
firewall_rules="/tmp/firewall-rules-${timestamp}.txt"
nat_rules="/tmp/nat-rules-${timestamp}.txt"
iptables -L -n -v --line-numbers > "$firewall_rules" 2>&1
iptables -t nat -L -n -v > "$nat_rules" 2>&1
# Count packets per rule
watch -n 5 'iptables -L -n -v | grep -v "^Chain\|^target"'
# Export firewall config (properly quoted backup file)
backup_file="/tmp/iptables-backup-$(date +%Y%m%d).rules"
iptables-save > "$backup_file" 2>&1
# For nftables (modern replacement)
nft_backup="/tmp/nftables-backup-$(date +%Y%m%d).conf"
nft list ruleset > "$nft_backup" 2>&1
Container & Cloud Operations
1. Docker Management (Production-Safe Quoting)
# List containers with status (properly quoted format)
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" 2>&1
# Find containers using specific port
port="5432"
docker ps --filter "publish=${port}" --format "{{.Names}}: {{.Ports}}" 2>&1
# Container resource usage (quoted log file)
stats_file="/tmp/docker-stats-$(date +%Y%m%d-%H%M%S).txt"
docker stats --no-stream 2>&1 | tee "$stats_file"
# Inspect container networking (properly quoted container name)
container_name="domus_postgres"
docker inspect "$container_name" | jq '.[0].NetworkSettings' 2>&1
# Real-time container logs with errors highlighted
docker logs -f "$container_name" 2>&1 | grep --color=always -E "ERROR|WARN|$"
# Export container logs (quoted filenames)
log_file="/tmp/${container_name}-logs-$(date +%Y%m%d).txt"
docker logs "$container_name" > "$log_file" 2>&1
# Container shell (troubleshooting)
docker exec -it "$container_name" /bin/bash 2>&1 || \
docker exec -it "$container_name" /bin/sh 2>&1
# Copy files from container (properly quoted paths)
source_path="/path/to/file"
dest_path="/tmp/extracted-file-$(date +%Y%m%d)"
docker cp "${container_name}:${source_path}" "$dest_path" 2>&1
Production Docker Monitoring Script (All Quoted):
#!/bin/bash
# Monitor Docker containers with proper quoting
set -euo pipefail
readonly LOG_FILE="/var/log/docker-monitor.log"
readonly CHECK_INTERVAL=60
log() {
local timestamp
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
echo "[${timestamp}] $*" | tee -a "$LOG_FILE"
}
while true; do
# Check for stopped containers (properly quoted)
stopped_containers=$(docker ps -a --filter "status=exited" --format "{{.Names}}" 2>&1)
if [ -n "$stopped_containers" ]; then
log "ALERT: Stopped containers: $stopped_containers"
fi
# Check for high memory usage (properly parsed)
docker stats --no-stream 2>&1 | \
awk 'NR>1 {gsub("%","",$7); if ($7 > 80) print $2, $7"%"}' | \
while read -r container mem; do
log "WARNING: ${container} using ${mem} memory"
done
# Check for restarting containers
restarting=$(docker ps --filter "status=restarting" --format "{{.Names}}" 2>&1)
if [ -n "$restarting" ]; then
log "CRITICAL: Restarting containers: $restarting"
fi
sleep "$CHECK_INTERVAL"
done
2. Railway Operations (Your Platform, Properly Quoted!)
#!/bin/bash
# Railway deployment with proper quoting
set -euo pipefail
readonly TIMESTAMP=$(date +%Y%m%d-%H%M%S)
readonly LOG_DIR="${HOME}/deployment-logs"
readonly DEPLOY_LOG="${LOG_DIR}/deploy-${TIMESTAMP}.log"
readonly ERROR_LOG="${LOG_DIR}/deploy-errors-${TIMESTAMP}.log"
mkdir -p "$LOG_DIR"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$DEPLOY_LOG"
}
error() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $*" | tee -a "$ERROR_LOG" >&2
}
# Get Railway logs (properly quoted)
log "Fetching Railway logs..."
railway logs --tail 100 2>&1 | tee "${LOG_DIR}/railway-logs-${TIMESTAMP}.txt"
# Check Railway status
log "Checking Railway status..."
railway status 2>&1 | tee -a "$DEPLOY_LOG"
# Get environment variables (properly quoted grep pattern)
log "Checking environment variables..."
railway variables --kv 2>&1 | \
grep -E "REDIS|DATABASE|NODE_ENV" | \
tee -a "$DEPLOY_LOG"
# Deploy with full logging (properly quoted redirection)
log "Starting deployment..."
if railway up > "${DEPLOY_LOG}.railway" 2> "${ERROR_LOG}.railway"; then
log "Deployment successful!"
else
error "Deployment failed - check ${ERROR_LOG}.railway"
exit 1
fi
Database Administration
1. PostgreSQL Operations (All Properly Quoted!)
# Connection testing (properly quoted URL)
psql "$DATABASE_URL" -c "SELECT version();" 2>&1
# Database size (properly quoted output file)
db_size_report="/tmp/db-size-$(date +%Y%m%d).txt"
psql "$DATABASE_URL" -c "
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;" 2>&1 \
> "$db_size_report"
# Table sizes
table_size_report="/tmp/table-sizes-$(date +%Y%m%d).txt"
psql "$DATABASE_URL" -c "
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;" 2>&1 \
> "$table_size_report"
# Active connections
connections_report="/tmp/active-connections-$(date +%Y%m%d).txt"
psql "$DATABASE_URL" -c "
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
state_change
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;" 2>&1 \
> "$connections_report"
# Long-running queries
long_queries="/tmp/long-queries-$(date +%Y%m%d).txt"
psql "$DATABASE_URL" -c "
SELECT
pid,
now() - query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE state != 'idle'
AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 10;" 2>&1 | tee "$long_queries"
# Backup database (CRITICAL: properly quoted filenames!)
backup_timestamp=$(date +%Y%m%d-%H%M%S)
backup_file="/tmp/backup-${backup_timestamp}.sql"
error_log="/tmp/backup-errors-${backup_timestamp}.log"
pg_dump "$DATABASE_URL" \
--no-owner \
--no-acl \
> "$backup_file" \
2> "$error_log"
# Verify backup (properly quoted file check)
if [ -s "$backup_file" ]; then
echo "✅ Backup created successfully: $backup_file"
ls -lh "$backup_file"
else
echo "❌ Backup failed - check errors: $error_log"
cat "$error_log"
exit 1
fi
2. Redis Operations (Your Stack, Properly Quoted!)
# Connect and get info (properly quoted URL and output)
redis_info="/tmp/redis-info-$(date +%Y%m%d).txt"
redis-cli -u "$REDIS_URL" INFO 2>&1 | tee "$redis_info"
# Monitor commands in real-time (properly quoted log file)
monitor_log="/tmp/redis-monitor-$(date +%Y%m%d-%H%M%S).log"
redis-cli -u "$REDIS_URL" MONITOR 2>&1 | tee "$monitor_log"
# Check memory usage
redis-cli -u "$REDIS_URL" INFO MEMORY 2>&1
# List keys sample (CAREFUL in production! Properly quoted output)
keys_sample="/tmp/redis-keys-sample-$(date +%Y%m%d).txt"
redis-cli -u "$REDIS_URL" --scan --pattern "*" 2>&1 | \
head -100 > "$keys_sample"
# Get key count
key_count=$(redis-cli -u "$REDIS_URL" DBSIZE 2>&1)
echo "Redis key count: $key_count"
# Check latency
redis-cli -u "$REDIS_URL" --latency 2>&1
# Test connectivity with timing
echo "Testing Redis connectivity..."
time redis-cli -u "$REDIS_URL" PING 2>&1
Complete Production Deployment Script
#!/bin/bash
# Production deployment - Railway/Domus Digitalis
# ALL BEST PRACTICES APPLIED!
set -euo pipefail
IFS=$'\n\t'
# ============================================================================
# CONSTANTS (All properly quoted!)
# ============================================================================
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly PROJECT_ROOT="/home/evanusmodestus/atelier/_projects/personal/domus-digitalis"
readonly TIMESTAMP=$(date +%Y%m%d-%H%M%S)
readonly LOG_DIR="${HOME}/deployment-logs"
readonly DEPLOY_LOG="${LOG_DIR}/deploy-${TIMESTAMP}.log"
readonly ERROR_LOG="${LOG_DIR}/deploy-errors-${TIMESTAMP}.log"
# ============================================================================
# LOGGING FUNCTIONS
# ============================================================================
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$DEPLOY_LOG"
}
error() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ERROR: $*" | tee -a "$ERROR_LOG" >&2
}
success() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ✅ $*" | tee -a "$DEPLOY_LOG"
}
# ============================================================================
# CLEANUP
# ============================================================================
cleanup() {
local exit_code=$?
if [ $exit_code -ne 0 ]; then
error "Deployment failed with exit code: $exit_code"
error "Check logs: $ERROR_LOG"
else
success "Deployment completed successfully"
fi
# Clean up temp files (properly quoted globs)
rm -f "${PROJECT_ROOT}/apps/backend/"*.sql
}
trap cleanup EXIT
# ============================================================================
# PRE-FLIGHT CHECKS
# ============================================================================
preflight_checks() {
log "Running pre-flight checks..."
# Check we're in the right directory
if [ ! -d "$PROJECT_ROOT" ]; then
error "Project root not found: $PROJECT_ROOT"
return 1
fi
cd "$PROJECT_ROOT" || {
error "Cannot change to project root"
return 1
}
# Verify git status
if ! git status &>/dev/null; then
error "Not a git repository"
return 1
fi
# Check for uncommitted changes
if ! git diff-index --quiet HEAD -- 2>/dev/null; then
error "Uncommitted changes detected - commit first"
return 1
fi
# Verify Railway CLI
if ! command -v railway &>/dev/null; then
error "Railway CLI not found - install first"
return 1
fi
success "Pre-flight checks passed"
}
# ============================================================================
# BACKUP PRODUCTION DATABASE
# ============================================================================
backup_production() {
log "Backing up production database..."
local backup_dir="${HOME}/backups"
mkdir -p "$backup_dir"
local backup_file="${backup_dir}/prod-backup-${TIMESTAMP}.sql"
local backup_errors="${backup_dir}/prod-backup-errors-${TIMESTAMP}.log"
# Load production secrets and backup
if load-secrets production applications && \
pg_dump "$DATABASE_PUBLIC_URL" > "$backup_file" 2> "$backup_errors"; then
if [ -s "$backup_file" ]; then
success "Backup created: $backup_file ($(du -h "$backup_file" | cut -f1))"
else
error "Backup file is empty - check: $backup_errors"
return 1
fi
else
log "No existing production database to backup (or backup failed)"
# Don't fail deployment if no DB exists yet
fi
}
# ============================================================================
# PREPARE DATA EXPORT
# ============================================================================
prepare_data() {
log "Preparing data export from local Docker..."
# Verify Docker is running
if ! docker ps &>/dev/null; then
error "Docker is not running"
return 1
fi
# Export from local database
local sql_file="${PROJECT_ROOT}/apps/backend/production-seed.sql"
local container="domus_postgres"
if docker exec "$container" pg_dump -U domus_user domus_dev > "$sql_file" 2>&1; then
success "Data exported: $sql_file ($(du -h "$sql_file" | cut -f1))"
else
error "Failed to export data from Docker"
return 1
fi
# Verify SQL file
if [ ! -s "$sql_file" ]; then
error "SQL file is empty"
return 1
fi
# Quick sanity check
local insert_count
insert_count=$(grep -c "INSERT INTO" "$sql_file" || echo "0")
log "SQL file contains $insert_count INSERT statements"
}
# ============================================================================
# DEPLOY TO RAILWAY
# ============================================================================
deploy_railway() {
log "Deploying to Railway production..."
# Link to production
railway link <<EOF
domusdigitalis-production
production
ExpressJS-production
EOF
# Verify environment variables
log "Verifying environment variables..."
local redis_url
redis_url=$(railway variables --kv 2>&1 | grep "^REDIS_URL=" | cut -d'=' -f2)
if [[ ! "$redis_url" =~ REDIS_PUBLIC_URL ]]; then
error "REDIS_URL is not set to \${{REDIS_PUBLIC_URL}}"
error "Current value: $redis_url"
return 1
fi
success "Environment variables verified"
# Deploy
log "Running railway up..."
if railway up 2>&1 | tee -a "$DEPLOY_LOG"; then
success "Railway deployment completed"
else
error "Railway deployment failed"
return 1
fi
# Wait for deployment to stabilize
log "Waiting 30 seconds for deployment to stabilize..."
sleep 30
}
# ============================================================================
# SEED DATABASE
# ============================================================================
seed_database() {
log "Seeding production database..."
local sql_file="${PROJECT_ROOT}/apps/backend/production-seed.sql"
if [ ! -f "$sql_file" ]; then
error "SQL file not found: $sql_file"
return 1
fi
# Import data (CRITICAL: chained with load-secrets!)
if load-secrets production applications && \
psql "$DATABASE_PUBLIC_URL" < "$sql_file" 2>&1 | tee -a "$DEPLOY_LOG"; then
success "Database seeded successfully"
else
error "Database seeding failed"
return 1
fi
# Verify data
log "Verifying database contents..."
local project_count
project_count=$(load-secrets production applications && \
psql "$DATABASE_PUBLIC_URL" -t -c "SELECT COUNT(*) FROM projects" 2>&1)
log "Projects in database: $project_count"
if [ "$project_count" -lt 1 ]; then
error "Database appears empty after seeding"
return 1
fi
success "Database verification passed"
}
# ============================================================================
# VERIFY DEPLOYMENT
# ============================================================================
verify_deployment() {
log "Verifying deployment..."
# Get production URL
local prod_url
prod_url=$(railway status 2>&1 | grep "URL:" | awk '{print $2}')
if [ -z "$prod_url" ]; then
error "Could not determine production URL"
return 1
fi
log "Production URL: $prod_url"
# Health check
log "Running health check..."
local health_response
health_response=$(curl -s "${prod_url}/api/health" 2>&1)
if echo "$health_response" | jq -e '.status == "ok"' &>/dev/null; then
success "Health check passed"
echo "$health_response" | jq . | tee -a "$DEPLOY_LOG"
else
error "Health check failed"
error "Response: $health_response"
return 1
fi
# Test API endpoints
log "Testing API endpoints..."
local projects_count
projects_count=$(curl -s "${prod_url}/api/projects?language=en" 2>&1 | jq '. | length' 2>&1)
log "API returned $projects_count projects"
success "All verification checks passed"
}
# ============================================================================
# CLEANUP TEMP FILES
# ============================================================================
cleanup_temp_files() {
log "Cleaning up temporary files..."
# Remove SQL file (security!)
local sql_file="${PROJECT_ROOT}/apps/backend/production-seed.sql"
if [ -f "$sql_file" ]; then
rm -f "$sql_file"
success "Removed temporary SQL file"
fi
# Verify it's gone
if [ -f "$sql_file" ]; then
error "Failed to remove SQL file: $sql_file"
return 1
fi
}
# ============================================================================
# MAIN
# ============================================================================
main() {
mkdir -p "$LOG_DIR"
log "=========================================="
log "Production Deployment Started"
log "Timestamp: $TIMESTAMP"
log "=========================================="
preflight_checks || exit 1
backup_production || exit 1
prepare_data || exit 1
deploy_railway || exit 1
seed_database || exit 1
verify_deployment || exit 1
cleanup_temp_files || exit 1
log "=========================================="
log "Deployment Complete!"
log "Logs: $DEPLOY_LOG"
log "=========================================="
}
main "$@"
Quick Reference Card
Shell Best Practices Cheat Sheet
# ✅ ALWAYS QUOTE
url="https://api.example.com/data?user=john&limit=10"
curl "$url"
# ✅ ALWAYS USE $()
timestamp=$(date +%Y%m%d-%H%M%S)
backup_file="backup-${timestamp}.sql"
# ✅ QUOTE VARIABLES
pg_dump "$DATABASE_URL" > "$backup_file" 2> "${backup_file}.errors"
# ✅ QUOTE FILE PATHS
config_file="/etc/myapp/config.conf"
cat "$config_file"
# ✅ ARRAY ELEMENTS
files=("file1.txt" "file2.txt" "file3.txt")
for file in "${files[@]}"; do
echo "Processing: $file"
done
# ✅ COMMAND SUBSTITUTION
result=$(complex_command | grep "pattern" | awk '{print $2}')
# ✅ HERE DOCUMENTS (for multi-line strings)
cat > "config.yml" << 'EOF'
database:
url: $DATABASE_URL
pool: 10
EOF
Stream Redirection Cheat Sheet
# Redirect stdout
command > "file.txt"
# Redirect stderr
command 2> "errors.txt"
# Redirect both (modern)
command &> "output.txt"
# Redirect both (traditional)
command > "output.txt" 2>&1
# Pipe both streams
command 2>&1 | grep "pattern"
# Show and save
command 2>&1 | tee "output.txt"
# Silence completely
command &> /dev/null