Competencies: Data Engineering > Data Pipelines

Data Pipelines

Body of Knowledge

Topic Description Relevance Career Tracks

API Data Pipelines

Data extraction, transformation, and loading from REST APIs. Includes filesystem scanning, document parsing, JSON serialization, and integration with network management APIs.

High

Data Engineer, Backend Engineer, Integration Engineer

ETL Fundamentals

Extract-Transform-Load process design, batch processing, scheduling, error handling

Critical

Data Engineer, Analytics Engineer

ELT Patterns

Extract-Load-Transform with in-database transformations, dbt, warehouse-centric processing

High

Data Engineer, Analytics Engineer

Stream Processing

Real-time data processing, Kafka Streams, Flink, windowing, exactly-once semantics

High

Data Engineer, Backend Developer

Workflow Orchestration

Airflow, Prefect, Dagster, DAG design, task dependencies, scheduling

Critical

Data Engineer, Platform Engineer

Change Data Capture

Database CDC, Debezium, log-based replication, event sourcing for pipelines

High

Data Engineer, Backend Developer

Data Quality

Data validation, schema enforcement, anomaly detection, data contracts

Critical

Data Engineer, Analytics Engineer

Pipeline Monitoring

Pipeline observability, SLAs, alerting, data freshness, lineage tracking

High

Data Engineer, SRE

Incremental Processing

Incremental loads, watermarks, late data handling, idempotent pipelines

High

Data Engineer, Analytics Engineer

Pipeline Testing

Data pipeline testing, data assertions, integration tests, data mocking

High

Data Engineer, QA Engineer

Personal Status

Topic Level Evidence Active Projects Gaps

API Data Pipelines

Advanced

domus-api — filesystem scanning, AsciiDoc parsing, JSON serialization for 3,486 files; ISE ERS API data extraction via netapi; curl-based data collection scripts

domus-api, netapi

No streaming pipelines, no message queues (Kafka, RabbitMQ), no event-driven processing