Post-Setup TODOs

Post-Setup TODOs

Phase 0: Toolchain

  • Install poppler, tesseract, exiftool, imagemagick (pacman)

  • Install sleuthkit — full TSK suite (pacman)

  • Install binwalk (pacman)

  • Install tshark, tcpdump, ngrep (pacman)

  • Install photorec/testdisk (pacman)

  • Install hashdeep (AUR)

  • Install fdupes (pacman)

  • Validate all 21 tools with version check

  • Live test: exiftool + binwalk + strings + identify on personal photo

  • Install AUR extras: dc3dd, libewf, scalpel, foremost, bulk_extractor, volatility3, findimagedupes, tcpflow

  • Create codex entries for each tool category

Phase 1: Document Pipeline

  • Test pdftotext on sample document library

  • Test tesseract OCR on scanned documents

  • Build batch extraction pipeline script

  • Tune tesseract PSM modes for common scan types

  • Test pandoc conversions (docx, epub, html)

Phase 2: Image Pipeline

  • Run full exiftool audit on personal photo library

  • Generate geolocation report

  • Run fdupes dedup pass

  • Test perceptual hash dedup with findimagedupes

  • Build photo sanitization script for sharing

Phase 3: System & Disk Forensics

  • Create test disk image with dd

  • Practice E01 imaging with ewfacquire

  • Run fls + mactime timeline on test image

  • Practice file carving with photorec/foremost

  • Test bulk_extractor artifact extraction

  • Build EnCase-equivalent workflow runbook

Phase 4: Network Forensics

  • Capture sample traffic with tcpdump

  • Practice tshark field extraction

  • Reconstruct TCP sessions with tcpflow

  • Build beaconing detection pipeline

  • Test DNS tunneling detection

Phase 5: Automation

  • Build evidence collection script

  • Set up weekly photo metadata audit cron

  • Set up daily integrity baseline check

  • Build batch OCR processing function

  • Build metadata sanitization function

Phase 6: AI Integration

  • Test document summarization pipeline with Ollama

  • Test image classification with LLaVA

  • Build log analysis pipeline

  • Explore embedding-based file clustering