An OpenAI-compatible shim so Home Assistant could talk to my agent
I wanted "Hey, $name" voice control around the house using local
satellite mics, with the actual brain being my own assistant on the
homelab. Home Assistant has a slick voice pipeline, but its
conversation agent surface speaks OpenAI's /v1/chat/completions
schema, not mine.
What was happening
HA's built-in OpenAI Conversation integration wants:
- A base URL ending in
/v1 - A
POST /v1/chat/completionsaccepting the OpenAI messages schema - A
GET /v1/modelsreturning at least one model - Bearer token auth
My assistant already had a POST /voice/chat endpoint that took a
single text field and returned a short reply. Wiring HA to that
directly meant either patching the integration or running an
ollama-shaped proxy in front of it. Both felt heavier than the
problem.
What I found
The OpenAI schema HA uses is very small. Building a shim that
translates HA's request into my existing voice.answer() call and
wraps the reply in the OpenAI envelope was about a hundred lines.
from fastapi import APIRouter, Header
from pydantic import BaseModel
router = APIRouter()
class Message(BaseModel):
role: str
content: str
class ChatCompletionsRequest(BaseModel):
model: str
messages: list[Message]
stream: bool | None = False
@router.get("/v1/models")
async def models(authorization: str = Header(...)):
_check_bearer(authorization)
return {"object": "list", "data": [
{"id": "cass-voice", "object": "model"}
]}
@router.post("/v1/chat/completions")
async def chat(req: ChatCompletionsRequest,
authorization: str = Header(...)):
_check_bearer(authorization)
# last user message is enough for HA's use case
user_text = next(
(m.content for m in reversed(req.messages) if m.role == "user"),
""
)
reply = await voice.answer(user_text)
return {
"id": "chatcmpl-shim",
"object": "chat.completion",
"model": req.model,
"choices": [{
"index": 0,
"finish_reason": "stop",
"message": {"role": "assistant", "content": reply},
}],
}
In HA: install the OpenAI Conversation integration, point base URL
at https://my-host/v1, paste the voice token as the API key,
pick cass-voice as the model. Done.
The fix
The full architecture: HA Voice satellite hardware (the $59 PE or a $17 Atom Echo) → local Whisper for STT → my shim endpoint → fast model on the homelab → local Piper for TTS → speaker on the same satellite. Wake word is openWakeWord. Round-trip latency hovers around 1.5–2.5 seconds depending on which room the satellite is in.
Everything except the LLM call stays on local hardware. The shim is the only network hop, and it's on the same LAN.
What I'd do differently
If I'd known how small the schema was I would have built the shim the same evening I started looking at HA voice instead of researching alternative integrations for a day. "What's the smallest adapter I can write?" is usually the right first question when a third-party tool wants a specific protocol.
The other thing: I built the shim to ignore everything in the request except the latest user message. That's fine for one-shot voice queries, but if I ever want HA to carry multi-turn context, I'll need to map its messages array into my own conversation store. Documenting the limit in the docstring so future me doesn't think it's broken.