Skip to content

Note

This page mirrors the project's CHANGELOG.md, which is the canonical source.

Changelog

All notable changes to erm are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

0.4.0 - 2026-06-14

Changed

  • Skill / agent install guidance prefers uvx. The erm and erm-tune skills and AGENTS.md now recommend running erm via uvx erm … (no persistent install; uv caches the env after first run) and fall back to a Python venv (python3 -m venv + pip install erm) only where uv isn't available — replacing the prior pipx install / pip install recommendation.

Added

  • First-class video I/O (--video) — feed erm a video file and, by default, still get the cleaned audio only (.wav, today's behavior and the common "pull the audio out of this video" case). Pass --video to render the picture too: the output container is inferred from the input (-o's extension overrides), and A/V stays in sync by construction (≤1 frame) — both streams render from the same edit timeline with the same frame-snapped crossfades, and the picture is conformed to the audio master's exact length.
  • --video-splice {crossfade,cut}crossfade (default) dissolves each splice; cut makes hard jump cuts (both streams hard-concat, so they can't drift).
  • --vcodec / --crf / --preset — encoder knobs for the re-encoded picture (remove mode). --mode silence stream-copies the picture untouched (lossless, frame-exact). --crf reaches x264/x265, VP9, and AV1; --preset reaches x264/x265 and libsvtav1. When an explicitly-set value can't be honored by the chosen encoder, the CLI warns instead of dropping it silently.
  • --min-gap-ms with --video plays the removed footage through the injected pause (muted) instead of freezing the frame.
  • Audio is stored losslessly where the container allows (PCM in mov/mkv/avi); mp4 gets AAC 256k, webm gets Opus 160k.
  • validate gains an A/V-sync check (video outputs only): the picture and audio streams must end within ~1 frame of each other.
  • See the video render & A/V sync design doc for the A/V-sync derivation, and working with video for the user-facing guide.
  • --add-fillers and --remove-fillers — adjust the pass-1 word list relative to the defaults instead of replacing it. --fillers still overrides the whole set; --add-fillers "basically,like" unions words on top of it, and --remove-fillers "ah" subtracts (removal wins over additions). Lets you keep the built-in stems and add or drop a couple of words without re-typing the full list.

0.3.0 - 2026-06-12

Render modes and pause spacing. Every new behavior is off by default — a default run renders byte-identical audio to 0.2.x, and the only cut-list change is two additive fields.

Added

  • --mode {remove,silence} — choose how detected cuts are applied.
  • remove (default): excise each cut and splice the survivors with crossfades (the existing behavior; the timeline shrinks).
  • silence: mute each cut span in place with a single ffmpeg volume pass, preserving the input's exact duration. Use it to keep A/V sync, multi-track alignment, and caption/transcript timestamps intact. The room-tone overlay fills the muted holes with the natural floor.
  • --pad-pause-factor (with --pad-min-ms / --pad-max-ms) — remove mode. Retain a fraction of the silence each cut snapped over so tight splices keep a little breathing room. Context-aware and never adds time; default 0.0 removes the whole cut.
  • --min-gap-msremove mode. Guarantee at least N ms between the two words flanking a splice, injecting silence when the natural pause is shorter. Default 0.0 injects nothing; injected silence is filled by room tone.
  • Cut-list JSON fields mode and injected_gap_s; silence mode also reports muted_s.
  • Public API: render_silenced, pad_cuts, inject_min_gaps (exported from erm).
  • docs/modes-and-padding.md design note and a README Modes section.
  • This CHANGELOG.md.

Changed

  • Default --model is now large-v3 (was medium.en). The larger model catches more fillers and produces tighter word boundaries, improving detection quality out of the box. Override with --model medium.en / --model small.en for faster, lower-accuracy runs.
  • erm validate now checks output duration per mode, reading mode / injected_gap_s from the cut list — silence: output ≈ input; remove: output ≈ input − cuts + injected_gap_s. Cut lists without these fields default to remove / 0.0 and validate exactly as before.
  • erm warns when muted holes or injected gaps would be bare digital silence rather than a natural floor (--mode silence or --min-gap-ms combined with --no-room-tone).
  • erm warns when --pad-pause-factor or --min-gap-ms are passed with --mode silence, since those knobs only shape remove-mode splices and are inert there.

0.2.0 - 2026-06-12

Fixed

  • Transcription gracefully falls back to CPU when the CUDA runtime libraries can't be loaded. --device auto (default) probes the GPU and falls back with a warning; --device cpu skips the probe entirely. (#3)

Changed

  • Expanded the test suite with CLI, integration, and pure-helper coverage. (#4)

0.1.1 - 2026-04-28

Added

  • Release tooling: Makefile, build/publish scripts, and GitHub Actions workflow. (#1)

0.1.0 - 2026-04-28

Added

  • Initial release. erm strips disfluencies (um, uh, er, erm, ah, hmm, mhm, mm, uh-huh, plus any-length elongations) from English speech using faster-whisper word timestamps, three audio-domain detectors for fillers Whisper hides, and ffmpeg for the cuts.
  • Energy-minimum + zero-crossing boundary refinement, cut-size-scaled crossfades, optional denoising (none / pre / post / hybrid), and a looped room-tone undertone for a uniform noise floor.
  • erm validate subcommand: container sanity, duration math, and a no-filler-survives invariant.