Competencies: Data Engineering > Data Storage

Data Storage

Body of Knowledge

Topic Description Relevance Career Tracks

Object Storage

S3, MinIO, GCS, blob storage patterns, lifecycle policies, versioning

Critical

Data Engineer, Cloud Engineer, DevOps Engineer

Data Lakes

Lake architecture, raw/curated zones, medallion pattern, data catalogs

High

Data Engineer, Data Architect

Data Warehouses

Snowflake, BigQuery, Redshift, warehouse optimization, workload management

High

Data Engineer, Analytics Engineer

Columnar Storage

Parquet, ORC, Arrow, column-oriented optimization, compression

High

Data Engineer, Analytics Engineer

File Formats

CSV, JSON, Parquet, Avro, format selection criteria, performance trade-offs

High

Data Engineer, Backend Developer

Data Partitioning

Partition strategies, partition pruning, bucketing, optimization

High

Data Engineer, Analytics Engineer

Caching Layers

Redis, Memcached, caching strategies, cache invalidation, TTL management

High

Backend Developer, Data Engineer

Data Archiving

Cold storage, archival strategies, data retention, compliance requirements

Medium

Data Engineer, Infrastructure Engineer

HDFS & Distributed Storage

Hadoop ecosystem, HDFS architecture, data locality, replication

Medium

Data Engineer, Infrastructure Engineer

Delta Lake & Iceberg

Table formats, ACID transactions, time travel, schema evolution

High

Data Engineer, Analytics Engineer

Personal Status

Topic Level Evidence Active Projects Gaps

To be populated

 — 

 — 

 — 

 —