In-House ETL Alternative

Summary

Assessment In-House ETL Alternative to Monad

Status

Complete - not selected

Approach

Build the ETL pipeline in-house using open-source tools (rsyslog, Vector) and Microsoft’s native capabilities (AMA + DCR)

Rationale

Monad has no native ISE/FTD connectors, requiring a custom syslog layer regardless; in-house provides full control over data flow and compliance without adding a vendor to a healthcare environment

Overview

Alternative to Monad: build the ETL pipeline in-house using open-source tools and Microsoft’s native capabilities.

Rationale:

  • Monad has no native ISE/FTD connectors - you build the syslog layer anyway

  • Microsoft DCR supports KQL filtering at ingestion time

  • Full control over data flow and compliance

  • No additional vendor in healthcare environment

Architecture Options

Option 1: rsyslog + Azure Monitor Agent

Leverage existing Linux syslog infrastructure with Microsoft’s native agent.

ISE/FTD/Network
      │ syslog (UDP/TCP 514)
      ▼
┌─────────────────────────────────┐
│  Linux Syslog Collector         │
│  (rsyslog or syslog-ng)         │
│                                 │
│  - Filter by facility/severity  │
│  - Write to local files         │
│  - Forward to AMA               │
└─────────────────────────────────┘
      │                    │
      ▼                    ▼
Azure Monitor Agent    S3/Blob Archive
      │                (compliance)
      ▼
Data Collection Rule (DCR)
      │ transformKql
      ▼
Microsoft Sentinel
(filtered, reduced volume)

Option 2: Vector Pipeline

Modern, Rust-based log processor with native transforms and multiple sinks.

ISE/FTD/Network
      │ syslog
      ▼
┌─────────────────────────────────┐
│  Vector                         │
│                                 │
│  Sources: syslog (UDP/TCP)      │
│  Transforms: filter, parse, VRL │
│  Sinks: Sentinel, S3, Blob     │
└─────────────────────────────────┘
      │                    │
      ▼                    ▼
  Sentinel API        S3/Blob Archive
  (critical only)     (full archive)

Option 3: Pure Microsoft (Simplest)

No additional tooling - just AMA and DCR.

ISE/FTD/Network
      │ syslog
      ▼
Linux VM with Azure Monitor Agent
      │
      ▼
Data Collection Rule (DCR)
      │
      │ transformKql: |
      │   source
      │   | where SyslogSeverity in ("err", "crit", "alert", "emerg")
      │   | where ProcessName == "ise" and Message contains "FAILED"
      │
      ▼
Microsoft Sentinel

Tool Comparison

Tool Strengths Weaknesses Use Case

rsyslog

Everywhere, mature, powerful RainerScript

Complex syntax, single-threaded

Already deployed, simple filtering

syslog-ng

Clean config, good filtering

Less common than rsyslog

When rsyslog isn’t available

Vector

Fast (Rust), VRL transforms, multiple sinks

Newer, learning curve

Complex transforms, high volume

Fluent Bit

Lightweight, Kubernetes-native

Limited transforms

Edge/container environments

Logstash

Feature-rich, Elastic ecosystem

Heavy (JVM), resource hungry

Existing Elastic investment

AMA + DCR

Native Microsoft, no extra infra

Limited to Sentinel, KQL learning

Simplest path to Sentinel

DCR Filtering Examples

Drop Informational Logs

source
| where SyslogSeverity !in ("info", "debug", "notice")

Keep Only Authentication Failures

source
| where ProcessName contains "ise"
| where Message contains "FAILED" or Message contains "5400"

Filter by Facility

source
| where Facility in ("auth", "authpriv", "local0")

rsyslog Filtering Examples

Route by Severity

# /etc/rsyslog.d/50-sentinel.conf

# Critical/Error → Sentinel
if $syslogseverity <= 3 then {
    action(type="omfwd" target="sentinel-collector" port="514" protocol="tcp")
}

# All logs → Archive
*.* action(type="omfile" file="/var/log/archive/all.log")

Filter ISE Logs

# Keep only ISE authentication failures
if $programname == 'ise' and $msg contains 'FAILED' then {
    action(type="omfwd" target="sentinel-collector" port="514" protocol="tcp")
}

Vector Configuration Example

# vector.toml

[sources.syslog_input]
type = "syslog"
address = "0.0.0.0:514"
mode = "udp"

[transforms.filter_critical]
type = "filter"
inputs = ["syslog_input"]
condition = '.severity <= 3 || contains(.message, "FAILED")'

[transforms.filter_bulk]
type = "filter"
inputs = ["syslog_input"]
condition = '.severity > 3 && !contains(.message, "FAILED")'

[sinks.sentinel]
type = "azure_monitor_logs"
inputs = ["filter_critical"]
customer_id = "${WORKSPACE_ID}"
shared_key = "${SHARED_KEY}"
log_type = "SecurityLogs"

[sinks.s3_archive]
type = "aws_s3"
inputs = ["syslog_input"]  # All logs
bucket = "log-archive"
region = "us-west-2"

Cost Comparison

Approach Additional Cost Notes

Monad

SaaS subscription + Sentinel ingestion

Vendor dependency, no native ISE/FTD

In-House (rsyslog/Vector)

Linux VM(s) + Sentinel ingestion

Staff time for setup/maintenance

Pure Microsoft (AMA/DCR)

Sentinel ingestion only

Simplest, but less flexible filtering

Recommendation

For CHLA network team scope (ISE, FTD, network devices):

Phase Approach

Start

Option 3 (AMA + DCR) - validate filtering works

Scale

Option 1 (rsyslog + AMA) - add central collector for complex routing

Advanced

Option 2 (Vector) - if transforms become complex or volume explodes

Key point: Start simple. DCR filtering may be enough. Add complexity only when needed.

Open Questions

  1. Existing syslog infrastructure - What’s already deployed for log collection?

  2. Azure landing zone - Is AMA already in use elsewhere?

  3. Archive requirements - How long must logs be retained? Where?

  4. Team skills - rsyslog expertise vs. willingness to learn Vector/VRL?