Troubleshooting: symptom → knob¶
Describe the bad result, find the fix. Each entry names the likely cause, the knob(s) to turn, and which way. For the mechanism behind a fix, follow the link; for the tuning workflow that makes diagnosis cheap, see usage.md → the iterate loop.
Diagnose on a --dry-run first. Most "wrong cut" problems are visible in the
cut-list JSON without rendering — and the per-cut label tells you which
detector fired (see the label table), which
narrows the fix immediately.
Fillers still audible / not removed¶
The detector never flagged it, or it flagged it too conservatively. Check the
--dry-run JSON: is there a cut near the filler at all?
- No cut there → detection missed it. The default
--model large-v3is already the most accurate; lower--gap-min-ms(default 350) so shorter pauses get scanned; make sure--detect-gapsis on. - A cut there but too short → it's being trimmed conservatively. For trailing
"uhhh" drawls that survive,
--no-confirm-pitchlets the overlong detector cut without acoustic confirmation — but it raises the risk of clipping slow real speech, so use it with care. Background: detection.md.
Real words getting clipped or removed¶
Detection is too aggressive. Read the label on the offending cut:
<long:WORD>or<in:WORD>→ keep--confirm-pitchon (it exists to protect real speech); raise--intraword-min-ms(default 550) so shorter words aren't scanned internally.<gap>→ raise--gap-min-msso brief natural pauses aren't treated as fillers.- a literal word → your filler list is too broad; drop that word with
--remove-fillers WORD(no need to re-type the whole--fillerslist).
See detection.md → the guards.
Splices sound clipped / too abrupt¶
The cut left no breathing room and the neighboring words butt together. Raise
--pad-pause-factor (try 0.3–0.5) to keep a fraction of the cut's own silence,
and/or set --min-gap-ms (try 80–120) to guarantee a minimum gap. These two
differ — render-pipeline.md → Part 3.
Words run together right after a cut¶
Set a --min-gap-ms floor — it injects silence at any splice whose natural pause
fell below the floor. render-pipeline.md → Part 4.
Splices sound smeared / blurry¶
The crossfade is too long for the material. Lower --crossfade-factor (default
0.15) or --max-crossfade-ms (default 120). render-pipeline.md → Part 1.
Audible click or pop at a splice¶
An endpoint didn't land on silence / a zero crossing. Raise --search-ms
(default 60) to give the refiner more room to find a local energy minimum, and
keep room tone on to mask the seam. render-pipeline.md → refinement.
Noise floor "pumps" or drops out at edits¶
The floor isn't uniform across splices. Keep --room-tone on (it lays a constant
floor everywhere); adjust --room-tone-level-db (default −12; −12 to −20 is the
usual range); or switch to --denoise post to smooth the whole output's floor in
one pass. denoise-and-room-tone.md.
silence-mode output has dead-sounding holes¶
The muted spans are bare digital zero because room tone is off. Re-enable
--room-tone — silence mode depends on it to fill the holes with the natural
floor (this is the warning erm prints). render-pipeline.md → the silence ↔ room-tone dependency.
--min-gap-ms errors on a multichannel file¶
Min-gap injection mints silence that must match the channel layout, and only
mono/stereo have an unambiguous one. Downmix the input to stereo first, or drop
--min-gap-ms. render-pipeline.md → Part 4.
GPU / CUDA error or warning¶
erm picked the GPU but the CUDA runtime libraries aren't installed. With
--device auto (default) it warns and falls back to CPU automatically. To
silence it, pass --device cpu; to actually use the GPU, install the CUDA wheels
(README's Transcription device).
Details: transcription.md → CUDA fallback.
Too slow¶
Use a smaller --model (small.en); add --device cuda if you have the CUDA
runtime; set --compute-type int8 for faster, lower-precision inference. See the
fastest-pass recipe.
Output over- or under-denoised¶
Tune the denoiser: --denoise-nr (reduction strength dB, default 12) up for more
aggressive cleanup, or --denoise-nf (noise floor dB, default −25). If denoising
is doing more harm than good, switch --denoise mode (none/post). See
usage.md → which denoise.