An OpenAI-compatible shim so Home Assistant could talk to my agent

FILE 0x97·AN OPENAI-COMPATIBLE SHIM SO HOME ASSISTANT COULD TALK TO MY

May 3, 2026 · home-assistant, voice, fastapi

I wanted "Hey, $name" voice control around the house using local satellite mics, with the actual brain being my own assistant on the homelab. Home Assistant has a slick voice pipeline, but its conversation agent surface speaks OpenAI's /v1/chat/completions schema, not mine.

What was happening

HA's built-in OpenAI Conversation integration wants:

A base URL ending in /v1
A POST /v1/chat/completions accepting the OpenAI messages schema
A GET /v1/models returning at least one model
Bearer token auth

My assistant already had a POST /voice/chat endpoint that took a single text field and returned a short reply. Wiring HA to that directly meant either patching the integration or running an ollama-shaped proxy in front of it. Both felt heavier than the problem.

What I found

The OpenAI schema HA uses is very small. Building a shim that translates HA's request into my existing voice.answer() call and wraps the reply in the OpenAI envelope was about a hundred lines.

from fastapi import APIRouter, Header
from pydantic import BaseModel

router = APIRouter()

class Message(BaseModel):
    role: str
    content: str

class ChatCompletionsRequest(BaseModel):
    model: str
    messages: list[Message]
    stream: bool | None = False

@router.get("/v1/models")
async def models(authorization: str = Header(...)):
    _check_bearer(authorization)
    return {"object": "list", "data": [
        {"id": "cass-voice", "object": "model"}
    ]}

@router.post("/v1/chat/completions")
async def chat(req: ChatCompletionsRequest,
               authorization: str = Header(...)):
    _check_bearer(authorization)
    # last user message is enough for HA's use case
    user_text = next(
        (m.content for m in reversed(req.messages) if m.role == "user"),
        ""
    )
    reply = await voice.answer(user_text)
    return {
        "id": "chatcmpl-shim",
        "object": "chat.completion",
        "model": req.model,
        "choices": [{
            "index": 0,
            "finish_reason": "stop",
            "message": {"role": "assistant", "content": reply},
        }],
    }

In HA: install the OpenAI Conversation integration, point base URL at https://my-host/v1, paste the voice token as the API key, pick cass-voice as the model. Done.

The fix

The full architecture: HA Voice satellite hardware (the $59 PE or a $17 Atom Echo) → local Whisper for STT → my shim endpoint → fast model on the homelab → local Piper for TTS → speaker on the same satellite. Wake word is openWakeWord. Round-trip latency hovers around 1.5–2.5 seconds depending on which room the satellite is in.

Everything except the LLM call stays on local hardware. The shim is the only network hop, and it's on the same LAN.

What I'd do differently

If I'd known how small the schema was I would have built the shim the same evening I started looking at HA voice instead of researching alternative integrations for a day. "What's the smallest adapter I can write?" is usually the right first question when a third-party tool wants a specific protocol.

The other thing: I built the shim to ignore everything in the request except the latest user message. That's fine for one-shot voice queries, but if I ever want HA to carry multi-turn context, I'll need to map its messages array into my own conversation store. Documenting the limit in the docstring so future me doesn't think it's broken.