Recipes: command lines for common jobs¶
Copy-paste starting points for the situations that come up most. Each one notes why those flags. For the reasoning behind a setting, follow the links into usage.md and the internals docs; for full flag defaults, see the README.
Output and cut-list paths are auto-named next to the input unless you pass -o /
--json, so the commands below omit them.
Quick clean — start here¶
The default pipeline: remove mode, hybrid denoise, room tone on, all
detectors on. Most recordings need nothing else.
Inspect before committing¶
Prints the cut list (and writes the JSON) without rendering — the fast way to see what would be cut and tune flags before paying for a render. This is the core of the iterate loop.
Conversational podcast that keeps its breathing room¶
Removes the filler but doesn't slam the neighboring words together: --pad-pause-factor
keeps half of the pause that was already inside the cut, and --min-gap-ms
guarantees at least 100 ms between the flanking words. Natural-sounding for
speech. How the two differ:
render-pipeline.md → Part 3.
Video / screencast that must keep caption & A/V timing¶
Mutes each filler in place instead of excising it, so the output is the exact
same length — caption timestamps and lip-sync don't drift. Leave room tone on; it
fills the muted holes with the natural floor. To render the picture too (not
just the audio), add --video — see working with video. Background:
usage.md → which mode.
Multitrack stem you'll re-mix against other tracks¶
Same reason: excising one stem would de-sync it from the other mics. silence
keeps every track frame-aligned.
Noisy room¶
Keeps the default hybrid denoise but pushes it harder (--denoise-nr is
reduction strength in dB, --denoise-nf the noise floor). If the floor still
shifts at edits, lower --room-tone-level-db toward -18. Mechanism:
denoise-and-room-tone.md.
Already-clean studio capture¶
Minimal-touch pass: no denoising, no added floor. Use only when the recording is
genuinely clean — without room tone, splices and any silence/min-gap holes
get no floor fill.
Fastest possible pass¶
Smaller model + GPU + 8-bit compute. Noticeably faster; slightly coarser word
boundaries and filler coverage. Drop --device cuda if you don't have the CUDA
runtime installed (see transcription.md).
Maximum filler coverage¶
large-v3 (the default) already catches the most fillers and lands the tightest
boundaries; the lower gap threshold scans shorter pauses for dropped fillers.
Slower, more thorough.
Custom filler vocabulary¶
Three flags shape pass 1's word list. They compose in order — --fillers
defines the set, --add-fillers unions on top, --remove-fillers subtracts
(removal wins):
# Replace the built-in list entirely.
erm input.wav --fillers "um,uh,er,like"
# Keep the defaults and add your own verbal tics — the common case.
erm input.wav --add-fillers "basically,like,you-know"
# Keep the defaults but drop one that over-matches your voice.
erm input.wav --remove-fillers "ah"
Reach for --add-fillers instead of --fillers when you just want the
defaults plus a few words: --fillers makes you re-type the whole built-in
list, and forgetting a stem silently drops it.
Caveat: automatic elongation matching (ummmm → um) only applies to the
built-in stems — a custom word like basically matches verbatim only. See
detection.md → pass 1.
Got a result that's wrong rather than a job to set up? Use troubleshooting.md.