Changelog¶

All notable changes to erm are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased ¶

0.4.0 - 2026-06-14¶

Changed¶

Skill / agent install guidance prefers uvx. The erm and erm-tune skills and AGENTS.md now recommend running erm via uvx erm … (no persistent install; uv caches the env after first run) and fall back to a Python venv (python3 -m venv + pip install erm) only where uv isn't available — replacing the prior pipx install / pip install recommendation.

Added¶

First-class video I/O (--video) — feed erm a video file and, by default, still get the cleaned audio only (.wav, today's behavior and the common "pull the audio out of this video" case). Pass --video to render the picture too: the output container is inferred from the input (-o's extension overrides), and A/V stays in sync by construction (≤1 frame) — both streams render from the same edit timeline with the same frame-snapped crossfades, and the picture is conformed to the audio master's exact length.
--video-splice {crossfade,cut} — crossfade (default) dissolves each splice; cut makes hard jump cuts (both streams hard-concat, so they can't drift).
--vcodec / --crf / --preset — encoder knobs for the re-encoded picture (remove mode). --mode silence stream-copies the picture untouched (lossless, frame-exact). --crf reaches x264/x265, VP9, and AV1; --preset reaches x264/x265 and libsvtav1. When an explicitly-set value can't be honored by the chosen encoder, the CLI warns instead of dropping it silently.
--min-gap-ms with --video plays the removed footage through the injected pause (muted) instead of freezing the frame.
Audio is stored losslessly where the container allows (PCM in mov/mkv/avi); mp4 gets AAC 256k, webm gets Opus 160k.
validate gains an A/V-sync check (video outputs only): the picture and audio streams must end within ~1 frame of each other.
See the video render & A/V sync design doc for the A/V-sync derivation, and working with video for the user-facing guide.
--add-fillers and --remove-fillers — adjust the pass-1 word list relative to the defaults instead of replacing it. --fillers still overrides the whole set; --add-fillers "basically,like" unions words on top of it, and --remove-fillers "ah" subtracts (removal wins over additions). Lets you keep the built-in stems and add or drop a couple of words without re-typing the full list.

0.3.0 - 2026-06-12¶

Render modes and pause spacing. Every new behavior is off by default — a default run renders byte-identical audio to 0.2.x, and the only cut-list change is two additive fields.

Added¶

--mode {remove,silence} — choose how detected cuts are applied.
remove (default): excise each cut and splice the survivors with crossfades (the existing behavior; the timeline shrinks).
silence: mute each cut span in place with a single ffmpeg volume pass, preserving the input's exact duration. Use it to keep A/V sync, multi-track alignment, and caption/transcript timestamps intact. The room-tone overlay fills the muted holes with the natural floor.
--pad-pause-factor (with --pad-min-ms / --pad-max-ms) — remove mode. Retain a fraction of the silence each cut snapped over so tight splices keep a little breathing room. Context-aware and never adds time; default 0.0 removes the whole cut.
--min-gap-ms — remove mode. Guarantee at least N ms between the two words flanking a splice, injecting silence when the natural pause is shorter. Default 0.0 injects nothing; injected silence is filled by room tone.
Cut-list JSON fields mode and injected_gap_s; silence mode also reports muted_s.
Public API: render_silenced, pad_cuts, inject_min_gaps (exported from erm).
docs/modes-and-padding.md design note and a README Modes section.
This CHANGELOG.md.

Changed¶

Default --model is now large-v3 (was medium.en). The larger model catches more fillers and produces tighter word boundaries, improving detection quality out of the box. Override with --model medium.en / --model small.en for faster, lower-accuracy runs.
erm validate now checks output duration per mode, reading mode / injected_gap_s from the cut list — silence: output ≈ input; remove: output ≈ input − cuts + injected_gap_s. Cut lists without these fields default to remove / 0.0 and validate exactly as before.
erm warns when muted holes or injected gaps would be bare digital silence rather than a natural floor (--mode silence or --min-gap-ms combined with --no-room-tone).
erm warns when --pad-pause-factor or --min-gap-ms are passed with --mode silence, since those knobs only shape remove-mode splices and are inert there.

0.2.0 - 2026-06-12¶

Fixed¶

Transcription gracefully falls back to CPU when the CUDA runtime libraries can't be loaded. --device auto (default) probes the GPU and falls back with a warning; --device cpu skips the probe entirely. (#3)

Changed¶

Expanded the test suite with CLI, integration, and pure-helper coverage. (#4)

0.1.1 - 2026-04-28¶

Added¶

Release tooling: Makefile, build/publish scripts, and GitHub Actions workflow. (#1)

0.1.0 - 2026-04-28¶

Added¶

Initial release. erm strips disfluencies (um, uh, er, erm, ah, hmm, mhm, mm, uh-huh, plus any-length elongations) from English speech using faster-whisper word timestamps, three audio-domain detectors for fillers Whisper hides, and ffmpeg for the cuts.
Energy-minimum + zero-crossing boundary refinement, cut-size-scaled crossfades, optional denoising (none / pre / post / hybrid), and a looped room-tone undertone for a uniform noise floor.
erm validate subcommand: container sanity, duration math, and a no-filler-survives invariant.

Changelog¶

Unreleased¶

0.4.0 - 2026-06-14¶

Changed¶

Added¶

0.3.0 - 2026-06-12¶

Added¶

Changed¶

0.2.0 - 2026-06-12¶

Fixed¶

Changed¶

0.1.1 - 2026-04-28¶

Added¶

0.1.0 - 2026-04-28¶

Added¶

Unreleased ¶