Post-Setup TODOs
Post-Setup TODOs
Phase 0: Toolchain
-
Install poppler, tesseract, exiftool, imagemagick (pacman)
-
Install sleuthkit — full TSK suite (pacman)
-
Install binwalk (pacman)
-
Install tshark, tcpdump, ngrep (pacman)
-
Install photorec/testdisk (pacman)
-
Install hashdeep (AUR)
-
Install fdupes (pacman)
-
Validate all 21 tools with version check
-
Live test: exiftool + binwalk + strings + identify on personal photo
-
Install AUR extras: dc3dd, libewf, scalpel, foremost, bulk_extractor, volatility3, findimagedupes, tcpflow
-
Create codex entries for each tool category
Phase 1: Document Pipeline
-
Test pdftotext on sample document library
-
Test tesseract OCR on scanned documents
-
Build batch extraction pipeline script
-
Tune tesseract PSM modes for common scan types
-
Test pandoc conversions (docx, epub, html)
Phase 2: Image Pipeline
-
Run full exiftool audit on personal photo library
-
Generate geolocation report
-
Run fdupes dedup pass
-
Test perceptual hash dedup with findimagedupes
-
Build photo sanitization script for sharing
Phase 3: System & Disk Forensics
-
Create test disk image with dd
-
Practice E01 imaging with ewfacquire
-
Run fls + mactime timeline on test image
-
Practice file carving with photorec/foremost
-
Test bulk_extractor artifact extraction
-
Build EnCase-equivalent workflow runbook
Phase 4: Network Forensics
-
Capture sample traffic with tcpdump
-
Practice tshark field extraction
-
Reconstruct TCP sessions with tcpflow
-
Build beaconing detection pipeline
-
Test DNS tunneling detection
Phase 5: Automation
-
Build evidence collection script
-
Set up weekly photo metadata audit cron
-
Set up daily integrity baseline check
-
Build batch OCR processing function
-
Build metadata sanitization function
Phase 6: AI Integration
-
Test document summarization pipeline with Ollama
-
Test image classification with LLaVA
-
Build log analysis pipeline
-
Explore embedding-based file clustering