AI cleanup: on-device text polishing after transcription

After transcription, Evoglyph optionally runs a second pass that fixes punctuation, casing, and filler words using a small on-device language model. This article explains what the cleanup pass does, how to toggle it, the latency it adds, and how to change the prompt that drives it.

What the cleanup pass does

Raw transcription output is grammatically flat: no capitalisation beyond the first word, minimal punctuation, filler words (uh, um, like) left in, and sentence breaks sometimes missing. The cleanup pass takes that raw text and returns a polished version (proper capitalisation, punctuation inserted at natural pauses, fillers stripped) before the text is injected into your app.

What the pass specifically fixes is controlled by the cleanup prompt (see Editing the cleanup prompt below). The default prompt targets punctuation, casing, and filler removal. It does not paraphrase or summarise; the words you said come out, just tidied.

The model: LFM2-2.6B 4-bit

The cleanup model is mlx-community/LFM2-2.6B-4bit (~1.45 GB), downloaded from Hugging Face on first launch and then stored locally. It runs entirely inside the Evoglyph process via MLX-Swift; no network calls happen at inference time.

Because the model runs in-process on Apple Silicon, it uses the Apple GPU (via Metal) and shares the same privacy guarantee as the transcription stage: your words never leave your Mac during cleanup.

Toggling cleanup on and off

Open the Evoglyph dashboard from the menu bar icon, go to Settings, and flip the AI cleanup toggle. The change takes effect on the next dictation. When cleanup is off, the raw transcription text is injected directly without any LLM pass.

Turning cleanup off is useful when you need exactly what you said: command strings, code identifiers, or structured data where punctuation insertion would break the output.

Latency tradeoff

Cleanup adds time on top of the speech-to-text stage. The benchmark methodology section documents the measured cleanup-stage latency (207ms p50, 1019ms p95 on the 117-fixture editorial-cleanup eval set). Latency varies with utterance length and ambient CPU/GPU load.

If the cleanup delay is noticeable in your workflow, try toggling cleanup off; the speech-to-text stage alone runs at 85ms p50 engine-time. You can also tune the prompt to request less processing (for example, punctuation only, no filler removal).

Cleanup and the full pipeline

Total time from hotkey-release to text appearing in your app includes the transcription stage, the cleanup stage (if enabled), and the injection stage. The benchmark page reports speech-to-text engine-time; the cleanup-stage numbers are in the same methodology section.

Editing the cleanup prompt

The system prompt sent to the cleanup model is fully editable. Open the dashboard, go to Prompt, and type a replacement. The prompt is saved immediately and used from the next dictation onward.

Prompt ideas:

Punctuation only: "Fix punctuation and capitalisation. Do not change any words."
Technical output: "Fix punctuation only. Preserve all identifiers, URLs, and command syntax exactly as spoken."
Minimal: "Capitalise the first word and add a period at the end. Change nothing else."

The cleanup prompt is saved in ~/Library/Application Support/evoglyph/ alongside other settings.

See App dashboard for a full walkthrough of the dashboard window.

Language note

The cleanup pipeline (model prompt, LoRA fine-tuning, and evaluation fixtures) is English-trained. Results on other languages are untested.