Back to blog
FILE 0x0A·VOICE ON WATCHOS WITHOUT THE SPEECH FRAMEWORK

Voice on watchOS without the Speech framework

April 22, 2026 · swiftui, watchos, accessibility

I wanted to talk to my homelab assistant from the Apple Watch. Action Button down, dictate, get a spoken reply back. Standard project except for one detail: I have cerebral palsy, so the dictation has to honor the system's "Listen for Atypical Speech" setting, and SFSpeechRecognizer doesn't.

What was happening

My first instinct was to import Speech into the watchOS target, spin up an SFSpeechRecognizer, and stream audio. That doesn't work — the Speech module isn't available on watchOS at all. The target won't even build.

Even if it did, the Atypical Speech model is a system-wide accessibility setting tied to Siri's dictation, not something you opt into by configuring SFSpeechRecognizer. The only way to get the corrected transcription is to route through whatever surface the OS uses for system dictation.

What I found

TextFieldLink in SwiftUI. It's a watchOS-only view that opens the system text input panel — the same one you get from the keyboard glyph in Messages — which honors every accessibility setting the OS has. The user dictates into it and you get a plain String back, already corrected.

The whole interaction model becomes:

  1. User taps a mic button (or, with one Shortcut hop, presses the Action Button).
  2. SwiftUI opens TextFieldLink.
  3. User dictates.
  4. App POSTs the text to a /voice/chat endpoint on my server.
  5. Server runs a fast model, returns a short reply.
  6. Watch displays the reply and speaks it via AVSpeechSynthesizer.

No on-device STT code on my side. The accessibility behavior is the OS's responsibility, which is where it belongs.

The fix

@State private var dictated: String = ""

var body: some View {
    TextFieldLink(prompt: Text("Ask")) {
        Image(systemName: "mic.fill")
    } onSubmit: { text in
        dictated = text
        Task { await send(text) }
    }
}

Server side is a small endpoint that takes the dictated text and a persisted conv_id (kept in UserDefaults so the conversation survives across taps):

struct VoiceRequest: Encodable {
    let text: String
    let conv_id: String?
}

Bearer token bakes into Info.plist at build time, sourced from the server's env file via xcodegen so I can rotate it without editing Swift.

What I'd do differently

I burned an evening trying to make SFSpeechRecognizer work on watchOS before I read the docs carefully enough to notice the framework wasn't there. The next time I want a platform-native input, I want to start by listing what's available on that platform's SDK rather than assuming the iOS surface area carries over. TextFieldLink turned out to be the better answer anyway — the OS does all the accessibility work for free.