Back to blog
FILE 0x28·DESIGNING A RECEIPT MATCHER FOR MY FINANCIAL DASHBOARD

Designing a receipt matcher for my financial dashboard

April 20, 2026 · personal-finance, design, gmail

Most of my transactions have a receipt sitting in Gmail somewhere. I just want to be able to click a charge in my finance dashboard and see the actual invoice, not the line item total. So I sketched out a receipt matcher.

The shape

A small CLI plus a nightly cron job:

receipt-matcher sync --days 30

For each transaction in DynamoDB without a linked receipt:

  1. Search Gmail in a ±2-day window around the transaction date.
  2. Score candidates by amount match and merchant-name overlap.
  3. Pick the highest-scoring candidate above a threshold.
  4. Render that email to PDF via WeasyPrint.
  5. Store the PDF in S3.
  6. Patch the DynamoDB transaction row with new attributes: - receipt_s3_key - receipt_gmail_msg_id - receipt_match_score - receipt_matched_at

The dashboard UI then renders a signed S3 URL next to the transaction so I can click straight to the receipt PDF.

Why HTML to PDF and not just save the .eml

I want the receipt to be useful out of context — printable, attachable to a tax return, viewable on a phone without an email client. WeasyPrint renders the HTML body, inlines the images, drops the email chrome (signatures, footers, ads), and gives me a portrait PDF that looks like a receipt.

Scoring

Two signals:

Threshold: about 0.7 combined score. Below that, leave the transaction unmatched and try again next sync.

Bidirectional linking

The other direction matters too. From any receipt in Gmail I want to be able to find the transaction it paid for. Same join key, just queried the other way. Lets me answer questions like "what did I actually buy at Amazon on March 3?" without manually scrolling through the order history.

Storing the Gmail message ID on the transaction row gives me both directions for free.

Why nightly, not real-time

Plaid posts transactions on a delay (sometimes hours, sometimes a day or two), and merchants can take their time emailing receipts. Trying to match in real time at the moment the transaction lands means the receipt often isn't there yet. A nightly sweep with a ±2-day window catches the long tail. The trade-off is that I won't see the receipt link in the dashboard immediately, which I'm fine with — finance review is a weekly thing, not a real-time thing.

What this unlocks

The downstream value is bigger than "click for receipt." Once every charge is linked to an invoice with line items, two new things become possible:

What I'd do differently

Build this around an extension to the existing mail pipeline rather than a separate Gmail search. I already have every email I receive in DynamoDB. Querying my own table is cheaper and faster than calling Gmail's API and avoids the OAuth refresh-token dance. If a receipt isn't in the Dynamo mail table yet, fall back to a Gmail search; otherwise, query locally. That'd cut the typical sync time significantly.