Post-Setup TODOs

Post-Setup TODOs

Phase 0: Toolchain

Install poppler, tesseract, exiftool, imagemagick (pacman)
Install sleuthkit — full TSK suite (pacman)
Install binwalk (pacman)
Install tshark, tcpdump, ngrep (pacman)
Install photorec/testdisk (pacman)
Install hashdeep (AUR)
Install fdupes (pacman)
Validate all 21 tools with version check
Live test: exiftool + binwalk + strings + identify on personal photo
Install AUR extras: dc3dd, libewf, scalpel, foremost, bulk_extractor, volatility3, findimagedupes, tcpflow
Create codex entries for each tool category

Phase 1: Document Pipeline

Test pdftotext on sample document library
Test tesseract OCR on scanned documents
Build batch extraction pipeline script
Tune tesseract PSM modes for common scan types
Test pandoc conversions (docx, epub, html)

Phase 2: Image Pipeline

Run full exiftool audit on personal photo library
Generate geolocation report
Run fdupes dedup pass
Test perceptual hash dedup with findimagedupes
Build photo sanitization script for sharing

Phase 3: System & Disk Forensics

Create test disk image with dd
Practice E01 imaging with ewfacquire
Run fls + mactime timeline on test image
Practice file carving with photorec/foremost
Test bulk_extractor artifact extraction
Build EnCase-equivalent workflow runbook

Phase 4: Network Forensics

Capture sample traffic with tcpdump
Practice tshark field extraction
Reconstruct TCP sessions with tcpflow
Build beaconing detection pipeline
Test DNS tunneling detection

Phase 5: Automation

Build evidence collection script
Set up weekly photo metadata audit cron
Set up daily integrity baseline check
Build batch OCR processing function
Build metadata sanitization function

Phase 6: AI Integration

Test document summarization pipeline with Ollama
Test image classification with LLaVA
Build log analysis pipeline
Explore embedding-based file clustering