In-House ETL Alternative
Summary
| Assessment | In-House ETL Alternative to Monad |
|---|---|
Status |
Complete - not selected |
Approach |
Build the ETL pipeline in-house using open-source tools (rsyslog, Vector) and Microsoft’s native capabilities (AMA + DCR) |
Rationale |
Monad has no native ISE/FTD connectors, requiring a custom syslog layer regardless; in-house provides full control over data flow and compliance without adding a vendor to a healthcare environment |
Overview
Alternative to Monad: build the ETL pipeline in-house using open-source tools and Microsoft’s native capabilities.
Rationale:
-
Monad has no native ISE/FTD connectors - you build the syslog layer anyway
-
Microsoft DCR supports KQL filtering at ingestion time
-
Full control over data flow and compliance
-
No additional vendor in healthcare environment
Architecture Options
Option 1: rsyslog + Azure Monitor Agent
Leverage existing Linux syslog infrastructure with Microsoft’s native agent.
ISE/FTD/Network
│ syslog (UDP/TCP 514)
▼
┌─────────────────────────────────┐
│ Linux Syslog Collector │
│ (rsyslog or syslog-ng) │
│ │
│ - Filter by facility/severity │
│ - Write to local files │
│ - Forward to AMA │
└─────────────────────────────────┘
│ │
▼ ▼
Azure Monitor Agent S3/Blob Archive
│ (compliance)
▼
Data Collection Rule (DCR)
│ transformKql
▼
Microsoft Sentinel
(filtered, reduced volume)
Option 2: Vector Pipeline
Modern, Rust-based log processor with native transforms and multiple sinks.
ISE/FTD/Network
│ syslog
▼
┌─────────────────────────────────┐
│ Vector │
│ │
│ Sources: syslog (UDP/TCP) │
│ Transforms: filter, parse, VRL │
│ Sinks: Sentinel, S3, Blob │
└─────────────────────────────────┘
│ │
▼ ▼
Sentinel API S3/Blob Archive
(critical only) (full archive)
Option 3: Pure Microsoft (Simplest)
No additional tooling - just AMA and DCR.
ISE/FTD/Network
│ syslog
▼
Linux VM with Azure Monitor Agent
│
▼
Data Collection Rule (DCR)
│
│ transformKql: |
│ source
│ | where SyslogSeverity in ("err", "crit", "alert", "emerg")
│ | where ProcessName == "ise" and Message contains "FAILED"
│
▼
Microsoft Sentinel
Tool Comparison
| Tool | Strengths | Weaknesses | Use Case |
|---|---|---|---|
rsyslog |
Everywhere, mature, powerful RainerScript |
Complex syntax, single-threaded |
Already deployed, simple filtering |
syslog-ng |
Clean config, good filtering |
Less common than rsyslog |
When rsyslog isn’t available |
Vector |
Fast (Rust), VRL transforms, multiple sinks |
Newer, learning curve |
Complex transforms, high volume |
Fluent Bit |
Lightweight, Kubernetes-native |
Limited transforms |
Edge/container environments |
Logstash |
Feature-rich, Elastic ecosystem |
Heavy (JVM), resource hungry |
Existing Elastic investment |
AMA + DCR |
Native Microsoft, no extra infra |
Limited to Sentinel, KQL learning |
Simplest path to Sentinel |
DCR Filtering Examples
Drop Informational Logs
source
| where SyslogSeverity !in ("info", "debug", "notice")
Keep Only Authentication Failures
source
| where ProcessName contains "ise"
| where Message contains "FAILED" or Message contains "5400"
Filter by Facility
source
| where Facility in ("auth", "authpriv", "local0")
rsyslog Filtering Examples
Route by Severity
# /etc/rsyslog.d/50-sentinel.conf
# Critical/Error → Sentinel
if $syslogseverity <= 3 then {
action(type="omfwd" target="sentinel-collector" port="514" protocol="tcp")
}
# All logs → Archive
*.* action(type="omfile" file="/var/log/archive/all.log")
Filter ISE Logs
# Keep only ISE authentication failures
if $programname == 'ise' and $msg contains 'FAILED' then {
action(type="omfwd" target="sentinel-collector" port="514" protocol="tcp")
}
Vector Configuration Example
# vector.toml
[sources.syslog_input]
type = "syslog"
address = "0.0.0.0:514"
mode = "udp"
[transforms.filter_critical]
type = "filter"
inputs = ["syslog_input"]
condition = '.severity <= 3 || contains(.message, "FAILED")'
[transforms.filter_bulk]
type = "filter"
inputs = ["syslog_input"]
condition = '.severity > 3 && !contains(.message, "FAILED")'
[sinks.sentinel]
type = "azure_monitor_logs"
inputs = ["filter_critical"]
customer_id = "${WORKSPACE_ID}"
shared_key = "${SHARED_KEY}"
log_type = "SecurityLogs"
[sinks.s3_archive]
type = "aws_s3"
inputs = ["syslog_input"] # All logs
bucket = "log-archive"
region = "us-west-2"
Cost Comparison
| Approach | Additional Cost | Notes |
|---|---|---|
Monad |
SaaS subscription + Sentinel ingestion |
Vendor dependency, no native ISE/FTD |
In-House (rsyslog/Vector) |
Linux VM(s) + Sentinel ingestion |
Staff time for setup/maintenance |
Pure Microsoft (AMA/DCR) |
Sentinel ingestion only |
Simplest, but less flexible filtering |
Recommendation
For CHLA network team scope (ISE, FTD, network devices):
| Phase | Approach |
|---|---|
Start |
Option 3 (AMA + DCR) - validate filtering works |
Scale |
Option 1 (rsyslog + AMA) - add central collector for complex routing |
Advanced |
Option 2 (Vector) - if transforms become complex or volume explodes |
Key point: Start simple. DCR filtering may be enough. Add complexity only when needed.
Open Questions
-
Existing syslog infrastructure - What’s already deployed for log collection?
-
Azure landing zone - Is AMA already in use elsewhere?
-
Archive requirements - How long must logs be retained? Where?
-
Team skills - rsyslog expertise vs. willingness to learn Vector/VRL?