Turning years of screenshots into a searchable journal

FILE 0x89·TURNING YEARS OF SCREENSHOTS INTO A SEARCHABLE JOURNAL

May 18, 2026 · ocr, automation, haiku, journaling

I have a Mac screenshots folder with thousands of files. Most of them are chat conversations — iMessage, Signal, Teams, email — captured because I wanted to remember something specific. They all blur together visually. None of them are searchable. I wanted them transcribed into my journal, dated correctly, so they'd actually be findable later.

What was happening

The screenshots folder grows by a couple of files a day. Each one is some combination of:

a chat conversation (the thing I usually wanted to remember)
a UI screenshot from an app I was debugging
an article excerpt
a receipt or confirmation page
a meme

Manually triaging this is hopeless. Auto-OCR alone isn't enough — plain text dumps lose the structure that makes the conversation useful (who said what, in what order, with what reactions). I wanted "Me: X" / "Person: Y" formatting that reads like a transcript.

What I found

A vision-capable fast model is exactly the right shape for this. Feed it a screenshot, ask it for structured output: is this a chat, who are the participants, what was said in order, were there reactions or attachments? Then route the answer.

The flow:

Watch the screenshots folder for new files.
For each new file, ask Haiku (vision input) to classify and transcribe. Output is a small JSON:

json { "is_chat": true, "app": "imessage", "captured_at": "2026-05-18T10:34:00", "participants": ["Me", "Friend"], "turns": [ {"speaker": "Friend", "text": "Did you see the email?", "ts": "10:31"}, {"speaker": "Me", "text": "Just now. Ugh.", "ts": "10:33", "reactions": ["thumbsup"]} ] }

If is_chat=false, tag the screenshot in metadata but skip journal entry.
If is_chat=true, render a journal entry in the format I actually want to read later:

``` ## iMessage with Friend — 2026-05-18 10:34

Friend (10:31): Did you see the email? Me (10:33): Just now. Ugh. 👍 ```

Post the entry to Day One, back-dated to the screenshot's capture timestamp (so it shows up in the right place when I scroll the journal chronologically). Attach the original screenshot. Tag with chat-screenshot plus the app name.

The fix

The state file is the unflashy but important part. Without it, every cron tick reprocesses every screenshot it can see:

# ~/.state/screenshot_transcribe.json
{
  "processed": {
    "/Users/me/Screenshots/foo.png": {
      "sha256": "...",
      "classified_at": "2026-05-18T03:00:00Z",
      "result": "chat",
      "journal_id": "abc123"
    },
    ...
  }
}

The transcriber is idempotent: it checks the state file by hash, skips anything already processed, and only spends Haiku tokens on genuinely new files. An hourly cron at :17 past each hour catches anything added that hour and adds it. The state file is the only reason the backfill could be safely interrupted and resumed.

Pilot run on the last seven days (82 screenshots): 43 chats journaled, 38 non-chat skipped, 1 duplicate detected. Full backfill against ~2,000 historical screenshots took about three hours of Haiku time at roughly $5 total — cheaper than the single afternoon I'd otherwise have spent re-screenshotting the ones I actually remember.

What I'd do differently

The classification step is doing double duty: it answers both "is this worth journaling?" and "what's the structured content?" Splitting it into two passes would save tokens (don't transcribe non-chats) but cost a round-trip per file. At the volumes I'm dealing with, one pass is the right tradeoff. At ten times the volume, I'd split.

The other thing: I almost added a Haiku call to summarize each transcript into a one-line title. I decided not to — the raw "App with Person — Date Time" header is honest about what the entry is, and any summary inserts another layer of "did the model get it right?" between me and the actual content. The journal is for me to skim, not for the model to interpret.