PRJ: Data Forensics
Project Summary
CLI-first data extraction and forensic analysis toolkit. Extract truth from opaque data — documents, photos, filesystems, memory, network captures — using terminal tools exclusively. Spans both work (incident response, evidence handling) and personal (photo library audit, system integrity, metadata hygiene).
The unifying principle: everything you touch produces data you should be able to interrogate from the CLI.
Status
| Phase | Description | Status | Notes |
|---|---|---|---|
0: Toolchain |
Install, configure, validate all tools |
✅ Done |
21 tools verified, live test on personal photo completed 2026-04-19 |
1: Document Pipeline |
PDF extraction, OCR, batch processing |
❌ Not started |
pdftotext, tesseract, pandoc |
2: Image Pipeline |
Metadata extraction, dedup, geo-audit |
❌ Not started |
exiftool, ImageMagick, fdupes |
3: System & Disk Forensics |
TSK, memory forensics, timeline analysis |
❌ Not started |
EnCase-equivalent CLI workflows |
4: Network Forensics |
Packet capture, protocol analysis, carving |
❌ Not started |
tshark, tcpdump, ngrep |
5: Automation |
Shell pipelines, scheduled audits, batch jobs |
❌ Not started |
find + xargs + awk chains |
6: AI Integration |
Local model ingestion, classification, summarization |
❌ Not started |
Ollama + extracted data |
| Field | Value |
|---|---|
PRJ ID |
PRJ-2026-04-data-forensics |
Author |
Evan Rosado |
Created |
2026-04-19 |
Updated |
2026-04-19 |
Phase |
0 Complete — Toolchain verified |
Status |
Active |
Category |
Data Analysis / Digital Forensics |
Priority |
P1 - High |
Scope |
Universal (Work + Personal) |