Note
This page mirrors the project's
CHANGELOG.md,
which is the canonical source.
Changelog¶
All notable changes to erm are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
0.4.0 - 2026-06-14¶
Changed¶
- Skill / agent install guidance prefers
uvx. Theermanderm-tuneskills andAGENTS.mdnow recommend running erm viauvx erm …(no persistent install; uv caches the env after first run) and fall back to a Python venv (python3 -m venv+pip install erm) only whereuvisn't available — replacing the priorpipx install/pip installrecommendation.
Added¶
- First-class video I/O (
--video) — feederma video file and, by default, still get the cleaned audio only (.wav, today's behavior and the common "pull the audio out of this video" case). Pass--videoto render the picture too: the output container is inferred from the input (-o's extension overrides), and A/V stays in sync by construction (≤1 frame) — both streams render from the same edit timeline with the same frame-snapped crossfades, and the picture is conformed to the audio master's exact length. --video-splice {crossfade,cut}—crossfade(default) dissolves each splice;cutmakes hard jump cuts (both streams hard-concat, so they can't drift).--vcodec/--crf/--preset— encoder knobs for the re-encoded picture (remove mode).--mode silencestream-copies the picture untouched (lossless, frame-exact).--crfreaches x264/x265, VP9, and AV1;--presetreaches x264/x265 andlibsvtav1. When an explicitly-set value can't be honored by the chosen encoder, the CLI warns instead of dropping it silently.--min-gap-mswith--videoplays the removed footage through the injected pause (muted) instead of freezing the frame.- Audio is stored losslessly where the container allows (PCM in mov/mkv/avi); mp4 gets AAC 256k, webm gets Opus 160k.
validategains an A/V-sync check (video outputs only): the picture and audio streams must end within ~1 frame of each other.- See the video render & A/V sync design doc for the A/V-sync derivation, and working with video for the user-facing guide.
--add-fillersand--remove-fillers— adjust the pass-1 word list relative to the defaults instead of replacing it.--fillersstill overrides the whole set;--add-fillers "basically,like"unions words on top of it, and--remove-fillers "ah"subtracts (removal wins over additions). Lets you keep the built-in stems and add or drop a couple of words without re-typing the full list.
0.3.0 - 2026-06-12¶
Render modes and pause spacing. Every new behavior is off by default — a default run renders byte-identical audio to 0.2.x, and the only cut-list change is two additive fields.
Added¶
--mode {remove,silence}— choose how detected cuts are applied.remove(default): excise each cut and splice the survivors with crossfades (the existing behavior; the timeline shrinks).silence: mute each cut span in place with a single ffmpegvolumepass, preserving the input's exact duration. Use it to keep A/V sync, multi-track alignment, and caption/transcript timestamps intact. The room-tone overlay fills the muted holes with the natural floor.--pad-pause-factor(with--pad-min-ms/--pad-max-ms) — remove mode. Retain a fraction of the silence each cut snapped over so tight splices keep a little breathing room. Context-aware and never adds time; default0.0removes the whole cut.--min-gap-ms— remove mode. Guarantee at least N ms between the two words flanking a splice, injecting silence when the natural pause is shorter. Default0.0injects nothing; injected silence is filled by room tone.- Cut-list JSON fields
modeandinjected_gap_s;silencemode also reportsmuted_s. - Public API:
render_silenced,pad_cuts,inject_min_gaps(exported fromerm). docs/modes-and-padding.mddesign note and a README Modes section.- This
CHANGELOG.md.
Changed¶
- Default
--modelis nowlarge-v3(wasmedium.en). The larger model catches more fillers and produces tighter word boundaries, improving detection quality out of the box. Override with--model medium.en/--model small.enfor faster, lower-accuracy runs. erm validatenow checks output duration per mode, readingmode/injected_gap_sfrom the cut list —silence:output ≈ input;remove:output ≈ input − cuts + injected_gap_s. Cut lists without these fields default toremove/0.0and validate exactly as before.ermwarns when muted holes or injected gaps would be bare digital silence rather than a natural floor (--mode silenceor--min-gap-mscombined with--no-room-tone).ermwarns when--pad-pause-factoror--min-gap-msare passed with--mode silence, since those knobs only shape remove-mode splices and are inert there.
0.2.0 - 2026-06-12¶
Fixed¶
- Transcription gracefully falls back to CPU when the CUDA runtime libraries
can't be loaded.
--device auto(default) probes the GPU and falls back with a warning;--device cpuskips the probe entirely. (#3)
Changed¶
- Expanded the test suite with CLI, integration, and pure-helper coverage. (#4)
0.1.1 - 2026-04-28¶
Added¶
- Release tooling:
Makefile, build/publish scripts, and GitHub Actions workflow. (#1)
0.1.0 - 2026-04-28¶
Added¶
- Initial release.
ermstrips disfluencies (um,uh,er,erm,ah,hmm,mhm,mm,uh-huh, plus any-length elongations) from English speech usingfaster-whisperword timestamps, three audio-domain detectors for fillers Whisper hides, and ffmpeg for the cuts. - Energy-minimum + zero-crossing boundary refinement, cut-size-scaled
crossfades, optional denoising (
none/pre/post/hybrid), and a looped room-tone undertone for a uniform noise floor. erm validatesubcommand: container sanity, duration math, and a no-filler-survives invariant.