~ / blog
# Writing <!-- 12 posts -->

Notes on data engineering, distributed systems, and the occasional detour through music theory and career.

0012 2026-05-04
## Don't pre-aggregate, land the movements and build the photo downstream

Why every consumer who says they only need the totals will eventually ask for the detail.

0011 2026-04-18
## The health table, the one piece of infrastructure I would build first

One row per table per run, raw measurements only. Everything else is a query on top.

0010 2026-03-30
## Tiered freshness, or how I stopped refreshing everything on the same cron

Hot, warm, and cold tiers — partition the pipeline so the tables that matter most get attention first.

0009 2026-03-14
## The three metadata columns I add to every extraction

_extracted_at, _batch_id, _source_hash — what each one costs and what it buys.

0008 2026-02-25
## Staging swap, or how to full-replace a live table without anyone noticing

Load into staging, validate, swap atomically. Zero downtime, trivial rollback.

0007 2026-02-03
## Hard delete detection, the case every cursor misses

The row was there yesterday and gone today. A cursor never sees it, so something else has to.

0006 2026-01-15
## Purity vs. freshness, the tradeoff every pipeline already made

Why full replace is the default, and when to deviate from it.

0005 2025-12-22
## The lies your sources tell you

A field guide to the assumptions that will eventually break your pipeline.

0004 2025-12-04
## The EL myth, and why I started calling it ECL

Pure EL doesn't exist. Every pipeline conforms whether it admits it or not.

0003 2025-10-26
## Incremental ingestion cheat-sheet - Guide for batch ingestion with real (ugly) data

Multiple examples of incremental loading rules for different cases

0002 2025-09-15
## I got promoted to team lead by thinking about scalability

Go for growth, not just technical skills

0001 2025-07-11
## I got a job by helping my girlfriend with her homework

How helping with Stata homework led to a career in D.Eng.