Back to blog
FILE 0xD4·ADDING EMAIL TO THE MEMEX (AND A DYNAMO PAGINATION GOTCHA)

Adding email to the Memex (and a Dynamo pagination gotcha)

April 26, 2026 · dynamodb, search, homelab

A few hours after shipping the Memex over iMessage and Signal, I added email. The interesting part wasn't the feature — it was a DynamoDB pagination bug I hit while writing the email backfill.

The setup

I already had a DynamoDB mail table holding parsed email metadata (from a mail interceptor Lambda I'd built earlier). To get email into the Memex I needed a one-shot backfill scanner and a 5-minute poller that queries new rows via the date-sorted GSI.

Result: 7,466 emails added on top of the existing 71k messages. Total 78,865 rows in the index.

The bug

Backfill kept blowing up partway through with:

ValidationException: provided starting key does not match the range key predicate

My loop looked roughly like this:

cursor = "2023-01-01T00:00:00Z"
last_evaluated_key = None
while True:
    kwargs = {
        "KeyConditionExpression": Key("gsi3pk").eq("MAIL") & Key("gsi3sk").gt(cursor),
        "IndexName": "gsi3-date",
    }
    if last_evaluated_key:
        kwargs["ExclusiveStartKey"] = last_evaluated_key
    resp = table.query(**kwargs)
    for item in resp["Items"]:
        process(item)
        cursor = item["gsi3sk"]   # <-- this is the trap
    last_evaluated_key = resp.get("LastEvaluatedKey")
    if not last_evaluated_key:
        break

Spot the issue? I was advancing cursor inside the loop while DynamoDB was still paginating with ExclusiveStartKey. On the next page, Dynamo validates that the start key satisfies the KeyConditionExpression — and now the predicate is gsi3sk > <newer- cursor> while the start key is <older-cursor>. Validation fails, the whole pagination dies.

The fix

Hold the KeyConditionExpression fixed for an entire query session. Only advance the cursor after pagination completes:

cursor = "2023-01-01T00:00:00Z"
while True:
    kwargs = {
        "KeyConditionExpression": Key("gsi3pk").eq("MAIL") & Key("gsi3sk").gt(cursor),
        "IndexName": "gsi3-date",
    }
    last_evaluated_key = None
    page_max_sk = cursor
    while True:
        if last_evaluated_key:
            kwargs["ExclusiveStartKey"] = last_evaluated_key
        resp = table.query(**kwargs)
        for item in resp["Items"]:
            process(item)
            page_max_sk = max(page_max_sk, item["gsi3sk"])
        last_evaluated_key = resp.get("LastEvaluatedKey")
        if not last_evaluated_key:
            break
    if page_max_sk == cursor:
        break
    cursor = page_max_sk

Outer loop advances cursor between query sessions, inner loop paginates a single session with a frozen predicate. Done.

Other things I learned that day

The mail table has two row patterns. INBOX#recipient for real emails and a separate sentinel pattern for dedup tracking. Always filter to the real rows or you'll embed nothing-rows with zero text.

Check the actual GSI names. I had a memory note saying my GSIs were named gsi1/gsi2/gsi3. They were actually named gsi1-thread, gsi2-from, gsi3-date. Saved nobody from a typo. Updated the note.

WAL files are sneaky. Before copying a SQLite DB across hosts, run PRAGMA wal_checkpoint(TRUNCATE). Otherwise the .db alone may reference pages still living in the WAL that didn't come along, and you'll get database disk image is malformed on the other side.

What I'd do differently

I knew about ExclusiveStartKey-must-match-predicate, but I'd never been bitten by it because most of my Dynamo loops use the start key as the only progress signal. The moment you start mutating the key predicate inside a loop, this trap appears. I'd factor any "advance the predicate" logic outside the pagination loop on principle now.