Thoughts

AllType:decision idea observation reference taskTime:7d 30d 90d

1 thought of type "observation" about "Model Upgrade"

observationmcpENN Pipeline Audio Issues Model Upgrade

3/25/2026

Session 37 (2026-03-25): ENN Pipeline - Major sync fix and model upgrade. ROOT CAUSE OF ALL SYNC ISSUES: The voiceover.mp3 in R2 was the old 75-second untrimmed version while everything else (timestamps, Hedra clips, manifest) was based on the 61-second trimmed version. Railway was downloading the wrong audio file every render. All the framerate normalization, caption delays, and audio re-encoding were treating symptoms of a simple data mismatch. KEY CHANGES: - Switched from Hedra Character-3 to Hedra Avatar model (better lip-sync quality at similar cost, 7 credits/sec). Tested 5 models: Character-3 (worst), Avatar (best value), Omnia (truncates audio, dealbreaker), Kling Avatar v2 Std (good but 2x cost), Kling Avatar v2 Pro (best quality but 3x cost). Avatar outputs 480x832 at 25fps, acceptable for 2D art on mobile. - Smart sentence-boundary splitting for multi-chunk Hedra: analyzes timestamps.json to find pauses after sentence-ending punctuation, splits audio at natural pauses instead of equal time intervals. Prevents mid-word glitches at chunk transitions. - Pre-render integrity check in render-batch.mjs: compares local file sizes against R2 before rendering, aborts with clear error if any stale assets detected. Would have caught the voiceover mismatch immediately. - generate-timestamps.mjs now uses ffprobe for actual MP3 duration instead of Whisper's last word end time. - Added remote-asset-integrity.md to ~/.claude/rules/ as a cross-project principle. RULES FOR PROCESS-SCRIPTS (not yet implemented): 1. Avatar shots must be distributed so every Hedra chunk gets at least one (otherwise that framing never appears on screen) 2. Flux stills before Hailuo clips should use slow_zoom_out (not zoom_in) so the still ends at the same scale as the animation's first frame 3. Never request readable text/letters/words in Flux prompts (generates gibberish). Include "no readable text no letters no words" in prompts involving documents/signs. Updated mantis portraits with visible mouths for all 3 framings (Close/Medium/Wide). Dave created these manually. All 4 characters have new broadcaster portraits in characters/News Anchor/ folders.

People: Dave