Benchmark

Last updated 3 min read

Evoglyph sells speed and locality. This page puts numbers on both, next to the alternatives developers shortlist: Wispr Flow, Superwhisper, and Apple Dictation. Our numbers are measured; competitor numbers are cited, with retrieval dates so you can check them.

How Evoglyph compares

Metric evoglyph Wispr Flow Superwhisper Apple Dictation
p50 latency ~600ms (full pipeline, cleanup on)1 ~700ms (end-to-end, as reported by Voibe)2 unconfirmed3 150-400ms (engine-level, as reported by Dictato)4
WER on LibriSpeech test-clean 2.41% (our harness, 75-clip subset)5 unconfirmed unconfirmed3 ~8% (as reported by MacRumors; different benchmark)6
Fully local? Yes7 No8 Partial (local models on Apple Silicon; optional cloud)3 Configurable9

Evoglyph rows are our own measurements. Competitor rows are vendor-published or reviewer-reported figures, cited below; no two products here are measured on a common harness. A cell marked unconfirmed means the vendor publishes no value and we could not triangulate one from a credible third party.

Read the latency row with scope in mind. Apple's 150-400ms is an engine benchmark, not a full pipeline. Evoglyph's ~600ms includes an AI cleanup pass that runs on-device, on by default; with cleanup off, a short dictation lands in about 300ms.1 Cleanup included, evoglyph still comes in under Wispr Flow's reviewer-reported ~700ms.2 Superwhisper publishes no performance figures at all, so its cells stay unconfirmed.3 On accuracy the gap is wide: 2.41% on our harness against Apple's reported ~8%, on a different benchmark.6

Metrics only we publish

No vendor on this page publishes these, so there is nothing to compare against. Dictation engines can emit phantom text from silence or background noise; the last two rows track that failure mode.

Metric evoglyph
p95 latency ~1.5s (full pipeline, cleanup on)1
Empty outputs on speech 0 in 100 clips5
Hallucinated text on silence 0 in 5 clips5

Why the cloud round-trip matters

Wispr Flow transcribes in the cloud, so every dictation pays for the network: TLS handshake, audio upload, server inference, text download.8 That cost is structural; no patch removes it. Reviewers put the result at roughly 700ms end-to-end.2 Evoglyph runs the whole pipeline on the Apple Neural Engine, inside your process. No round-trip. Audio never leaves your Mac.7

Local alone does not distinguish us: Superwhisper runs local models on Apple Silicon3 and Apple Dictation has an on-device mode.9 The difference is that we publish our numbers: full-pipeline latency, a measured 2.41% error rate,5 and an on-device cleanup pass included in the ~600ms above.1

Methodology

How we measure latency

Latency is the full pipeline, from hotkey release to text in the focused app, measured across about 2,500 real dictations: one user, one machine, an Apple M4 Max.1 The pipeline runs voice-activity trimming, Parakeet TDT speech-to-text on the Apple Neural Engine, vocabulary boosting, and AI cleanup, all on-device, cleanup on by default. A short dictation of a sentence or two runs ~600ms p50 and ~1.5s p95; with cleanup off, ~300ms p50 and ~390ms p95. Latency scales with input: a 15- to 30-second paragraph runs roughly 1.6s p50 and 2.4s p95, and longer dictations take several seconds, because vocabulary boosting and cleanup both work through more text.1

Sub-stage times do not sum to the pipeline totals; percentiles never add. The raw ASR-engine slice runs about 120ms in production,1 and our eval harness reports about 85ms on clean fixtures.5 Neither is a number you feel, so the table leads with the full pipeline.

How we measure accuracy

WER, empty-output, and silence-hallucination numbers come from a 105-clip public evaluation set: 75 clips from LibriSpeech test-clean, 20 from the VCTK corpus, 5 silence clips, and 5 vocabulary-dense clips, run on the same M4 Max.5 Reference transcripts and model output both pass through OpenAI's EnglishTextNormalizer before WER is computed, so punctuation and contraction artifacts do not inflate the error rate. Cleanup quality comes from a separate 117-fixture editorial-cleanup eval against the production prompt and LoRA bundle that ship in the app.10

Data availability

Raw per-clip results from both evals are available on request via [email protected]. We do not update this page silently: each new baseline lands as a dated entry in our internal eval/results/ archive with a new citation here.

Sources

  1. Evoglyph full-pipeline end-to-end latency (~600ms p50 / ~1.5s p95 for a short dictation with cleanup on; ~300ms p50 / ~390ms p95 with cleanup off; ~120ms p50 for the ASR-engine slice), measured from the production transcriptions database across about 2,500 real dictations in daily use on an Apple M4 Max (Parakeet TDT v2 + LFM2-2.6B 4-bit cleanup, on by default). Internal measurement; see internal-docs/latency-claims-handoff.md. 2026-06-24.
  2. Wispr Flow latency, ~700ms end-to-end (cloud round-trip): Voibe Wispr Flow review. Voibe is a competing dictation vendor's review site, not a neutral lab. Retrieved 2026-05-11.
  3. Superwhisper architecture and published figures: superwhisper.com. Works offline with local models on Apple Silicon; cloud models offered, and recommended for Intel Macs. The site publishes no latency or accuracy figures. Retrieved 2026-07-01.
  4. Apple Dictation / SpeechAnalyzer latency, 150-400ms (engine-level): Dictato engine benchmark. Retrieved 2026-05-11.
  5. Evoglyph WER, empty-output, and silence-hallucination numbers, plus eval-harness engine time (~85ms): internal eval results, app-repo eval/results/v5p6-shipping-baseline-2026-05-11.md and README Evaluation section (#parakeet-vs-whisperkit-105-public-clips). Hardware: Apple M4 Max, 64 GB. Eval set: 105 public clips (75 LibriSpeech test-clean + 20 VCTK + 5 silence + 5 vocab-dense). 2026-05-11.
  6. Apple SpeechAnalyzer WER ~8% (CER ~3%): MacRumors transcription benchmark. Different dataset and methodology than LibriSpeech test-clean; a like-with-caveats comparison. Retrieved 2026-05-11.
  7. Evoglyph architecture (local-only, proprietary): Privacy and local-first docs page. 2026-05-11.
  8. Wispr Flow cloud transcription: wisprflow.ai/privacy: "transcription always happens in the cloud to provide the best speed and accuracy". Retrieved 2026-05-11.
  9. Apple Dictation on-device processing: Apple support page. Keyboard settings show whether voice input is "processed on your device and not sent to Siri servers". Retrieved 2026-06-06.
  10. Evoglyph cleanup-stage quality and eval-harness latency (p50 207ms, p95 1019ms on the 117-fixture editorial-cleanup eval set), on the production prompt and LoRA bundle that ship in the app: internal eval results, app-repo eval/results/v5p6-shipping-baseline-2026-05-11.md. Hardware: Apple M4 Max, 64 GB. 2026-05-11.