Competencies: Data Engineering > Data Storage
Data Storage
Body of Knowledge
| Topic | Description | Relevance | Career Tracks |
|---|---|---|---|
Object Storage |
S3, MinIO, GCS, blob storage patterns, lifecycle policies, versioning |
Critical |
Data Engineer, Cloud Engineer, DevOps Engineer |
Data Lakes |
Lake architecture, raw/curated zones, medallion pattern, data catalogs |
High |
Data Engineer, Data Architect |
Data Warehouses |
Snowflake, BigQuery, Redshift, warehouse optimization, workload management |
High |
Data Engineer, Analytics Engineer |
Columnar Storage |
Parquet, ORC, Arrow, column-oriented optimization, compression |
High |
Data Engineer, Analytics Engineer |
File Formats |
CSV, JSON, Parquet, Avro, format selection criteria, performance trade-offs |
High |
Data Engineer, Backend Developer |
Data Partitioning |
Partition strategies, partition pruning, bucketing, optimization |
High |
Data Engineer, Analytics Engineer |
Caching Layers |
Redis, Memcached, caching strategies, cache invalidation, TTL management |
High |
Backend Developer, Data Engineer |
Data Archiving |
Cold storage, archival strategies, data retention, compliance requirements |
Medium |
Data Engineer, Infrastructure Engineer |
HDFS & Distributed Storage |
Hadoop ecosystem, HDFS architecture, data locality, replication |
Medium |
Data Engineer, Infrastructure Engineer |
Delta Lake & Iceberg |
Table formats, ACID transactions, time travel, schema evolution |
High |
Data Engineer, Analytics Engineer |
Personal Status
| Topic | Level | Evidence | Active Projects | Gaps |
|---|---|---|---|---|
To be populated |
— |
— |
— |
— |