Benchmark | evoglyph

Evoglyph sells speed and locality. This page puts numbers on both, next to the alternatives developers shortlist: Wispr Flow, Superwhisper, and Apple Dictation. Our numbers are measured; competitor numbers are cited, with retrieval dates so you can check them.

How Evoglyph compares

Metric	evoglyph	Wispr Flow	Superwhisper	Apple Dictation
p50 latency	~600ms (full pipeline, cleanup on)¹	~700ms (end-to-end, as reported by Voibe)²	unconfirmed³	150-400ms (engine-level, as reported by Dictato)⁴
WER on LibriSpeech test-clean	2.41% (our harness, 75-clip subset)⁵	unconfirmed	unconfirmed³	~8% (as reported by MacRumors; different benchmark)⁶
Fully local?	Yes⁷	No⁸	Partial (local models on Apple Silicon; optional cloud)³	Configurable⁹

Evoglyph rows are our own measurements. Competitor rows are vendor-published or reviewer-reported figures, cited below; no two products here are measured on a common harness. A cell marked unconfirmed means the vendor publishes no value and we could not triangulate one from a credible third party.

Read the latency row with scope in mind. Apple's 150-400ms is an engine benchmark, not a full pipeline. Evoglyph's ~600ms includes an AI cleanup pass that runs on-device, on by default; with cleanup off, a short dictation lands in about 300ms.¹ Cleanup included, evoglyph still comes in under Wispr Flow's reviewer-reported ~700ms.² Superwhisper publishes no performance figures at all, so its cells stay unconfirmed.³ On accuracy the gap is wide: 2.41% on our harness against Apple's reported ~8%, on a different benchmark.⁶

Metrics only we publish

No vendor on this page publishes these, so there is nothing to compare against. Dictation engines can emit phantom text from silence or background noise; the last two rows track that failure mode.

Metric	evoglyph
p95 latency	~1.5s (full pipeline, cleanup on)¹
Empty outputs on speech	0 in 100 clips⁵
Hallucinated text on silence	0 in 5 clips⁵

Why the cloud round-trip matters

Wispr Flow transcribes in the cloud, so every dictation pays for the network: TLS handshake, audio upload, server inference, text download.⁸ That cost is structural; no patch removes it. Reviewers put the result at roughly 700ms end-to-end.² Evoglyph runs the whole pipeline on the Apple Neural Engine, inside your process. No round-trip. Audio never leaves your Mac.⁷

Local alone does not distinguish us: Superwhisper runs local models on Apple Silicon³ and Apple Dictation has an on-device mode.⁹ The difference is that we publish our numbers: full-pipeline latency, a measured 2.41% error rate,⁵ and an on-device cleanup pass included in the ~600ms above.¹

Methodology

How we measure latency

Latency is the full pipeline, from hotkey release to text in the focused app, measured across about 2,500 real dictations: one user, one machine, an Apple M4 Max.¹ The pipeline runs voice-activity trimming, Parakeet TDT speech-to-text on the Apple Neural Engine, vocabulary boosting, and AI cleanup, all on-device, cleanup on by default. A short dictation of a sentence or two runs ~600ms p50 and ~1.5s p95; with cleanup off, ~300ms p50 and ~390ms p95. Latency scales with input: a 15- to 30-second paragraph runs roughly 1.6s p50 and 2.4s p95, and longer dictations take several seconds, because vocabulary boosting and cleanup both work through more text.¹

Sub-stage times do not sum to the pipeline totals; percentiles never add. The raw ASR-engine slice runs about 120ms in production,¹ and our eval harness reports about 85ms on clean fixtures.⁵ Neither is a number you feel, so the table leads with the full pipeline.

How we measure accuracy

WER, empty-output, and silence-hallucination numbers come from a 105-clip public evaluation set: 75 clips from LibriSpeech test-clean, 20 from the VCTK corpus, 5 silence clips, and 5 vocabulary-dense clips, run on the same M4 Max.⁵ Reference transcripts and model output both pass through OpenAI's EnglishTextNormalizer before WER is computed, so punctuation and contraction artifacts do not inflate the error rate. Cleanup quality comes from a separate 117-fixture editorial-cleanup eval against the production prompt and LoRA bundle that ship in the app.¹⁰

Data availability

Raw per-clip results from both evals are available on request via [email protected]. We do not update this page silently: each new baseline lands as a dated entry in our internal eval/results/ archive with a new citation here.

Sources

Evoglyph full-pipeline end-to-end latency (~600ms p50 / ~1.5s p95 for a short dictation with cleanup on; ~300ms p50 / ~390ms p95 with cleanup off; ~120ms p50 for the ASR-engine slice), measured from the production transcriptions database across about 2,500 real dictations in daily use on an Apple M4 Max (Parakeet TDT v2 + LFM2-2.6B 4-bit cleanup, on by default). Internal measurement; see internal-docs/latency-claims-handoff.md. 2026-06-24.
Wispr Flow latency, ~700ms end-to-end (cloud round-trip): Voibe Wispr Flow review. Voibe is a competing dictation vendor's review site, not a neutral lab. Retrieved 2026-05-11.
Superwhisper architecture and published figures: superwhisper.com. Works offline with local models on Apple Silicon; cloud models offered, and recommended for Intel Macs. The site publishes no latency or accuracy figures. Retrieved 2026-07-01.
Apple Dictation / SpeechAnalyzer latency, 150-400ms (engine-level): Dictato engine benchmark. Retrieved 2026-05-11.
Evoglyph WER, empty-output, and silence-hallucination numbers, plus eval-harness engine time (~85ms): internal eval results, app-repo eval/results/v5p6-shipping-baseline-2026-05-11.md and README Evaluation section (#parakeet-vs-whisperkit-105-public-clips). Hardware: Apple M4 Max, 64 GB. Eval set: 105 public clips (75 LibriSpeech test-clean + 20 VCTK + 5 silence + 5 vocab-dense). 2026-05-11.
Apple SpeechAnalyzer WER ~8% (CER ~3%): MacRumors transcription benchmark. Different dataset and methodology than LibriSpeech test-clean; a like-with-caveats comparison. Retrieved 2026-05-11.
Evoglyph architecture (local-only, proprietary): Privacy and local-first docs page. 2026-05-11.
Wispr Flow cloud transcription: wisprflow.ai/privacy: "transcription always happens in the cloud to provide the best speed and accuracy". Retrieved 2026-05-11.
Apple Dictation on-device processing: Apple support page. Keyboard settings show whether voice input is "processed on your device and not sent to Siri servers". Retrieved 2026-06-06.
Evoglyph cleanup-stage quality and eval-harness latency (p50 207ms, p95 1019ms on the 117-fixture editorial-cleanup eval set), on the production prompt and LoRA bundle that ship in the app: internal eval results, app-repo eval/results/v5p6-shipping-baseline-2026-05-11.md. Hardware: Apple M4 Max, 64 GB. 2026-05-11.