Thoughts

12 thoughts about "UFO Pipeline"

UFO Pipeline Session 29 (2026-03-22): Phase 7 YouTube publishing core build complete. Built youtube-auth.mjs (OAuth2 for Brand Account), generate-thumbnails.mjs (Flux + sharp text overlay), youtube-publish.mjs (scheduled uploads with metadata and DB tracking), ffmpeg-reprocess.mjs (bakes thumbnail as first frame + 1s end hold). Discovered YouTube Shorts custom thumbnails via API are unreliable: API accepts upload without error but silently drops or reverts them. Studio UI also doesn't support custom Shorts thumbnails. Workaround: ffmpeg prepends thumbnail image as 0.5s first frame so YouTube auto-selects it. All 43 shorts reprocessed with baked thumbnails and uploaded to R2. Test uploaded 6 shorts, verified working, then deleted them to clean channel for launch. Channel name: Exo News Network. Launch date: April 1, 2026. Upload plan: 5/day over 9 sessions with character rotation for variety. Thumbnail design principle from Dave: follow real best practices, dynamic text placement, never cover faces, no rigid positioning rules. Remaining: upload all 43 in batches of 5, update Remotion to bake thumbnail frame natively for future batches.

People: Dave

UFO Pipeline Session 28 (2026-03-20): Fixed Railway OOM that was killing renders for "The Maid and the Beings of Light" (8f127e42). Root cause was three compounding memory issues in remotion/src/server.ts: (1) downloadFile() buffered entire assets in RAM via Buffer.from(await res.arrayBuffer()), and Promise.all() downloaded all 9+ assets simultaneously, creating 160-220MB peak RAM usage. (2) Upload read the entire rendered MP4 into RAM via fs.readFileSync() (40-100MB per video). (3) Remotion's OffthreadVideo decodes multiple Hedra video streams during avatar-led renders, adding to baseline memory pressure. Fixes applied: (1) Streaming downloads using ReadableStreamDefaultReader piped to fs.createWriteStream(), zero RAM buffering per asset. (2) Sequential downloads via for...of loop instead of Promise.all(), only one asset streams at a time, adds ~5-10s download time but saves 150+MB peak RAM. (3) Streaming upload using fs.createReadStream() with duplex: "half" fetch option instead of readFileSync. Total peak memory reduction estimated at 200-300MB. After deploying the fix, the previously 3x-OOM-failing short rendered successfully in 102 seconds (45.5MB output). All 43 YouTube Shorts are now complete in Cloudflare R2 with updated captions. Ready for Phase 7: YouTube publishing to Project Veilgate channel. Launch date April 1, 2026.

UFO Pipeline Session 26 (2026-03-20): Rewrote the Remotion Caption component with 3 fixes and 5 caption styles. Timing fix: words now group into display lines by punctuation, lines only appear when first word's Whisper start timestamp is reached (no more premature display after pauses). Spacing fix: removed fontSize toggle, uses transform:scale only so layout gaps are preserved. 5 styles implemented: pop (single word spring scale-in), karaoke (full line with gold highlight, default), subtitle (bottom-third bar clean sans-serif), bold (2-3 words large with cyan key-word highlights), wave (words slide up building a line). ShortComposition maps manifest caption_style field to component. Existing manifests use "word_by_word_animated" which maps to "karaoke". Also updated the process-scripts skill with Step 3b: voice rewrite for recast characters. When Claude assigns a different character than Gemini originally wrote, HOOK/BODY/PAYOFF get rewritten in the new character's voice using character bible personality notes. Test rendered 2 shorts locally (avatar-led + faceless), both clean. Dave approved. Next: deploy updated Remotion to Railway, re-render all 43 shorts with --force, then Phase 7 YouTube publishing.

People: Dave

UFO Pipeline: Character perspective rewrite decision. When the process-scripts skill recasts a character (changes Gemini's suggested character to a different one based on topic affinities/distribution), it must now also rewrite the script's HOOK, BODY, and PAYOFF in the new character's voice using the character bible's personality notes. Previously it only changed the CHARACTER field without rewriting the script voice, causing mismatches (e.g., Reptilian voice on a Grey-assigned short). Gemini Gems are also being updated to write in-character from the start (non-human perspective), so most scripts (~80%) arrive already correct. The rewrite step is the safety net for the ~20% that get recast. Both changes needed: Gem update for quality-in, process-scripts rewrite for recast consistency.

UFO Pipeline Session 25: Phase 6 complete. All 43 shorts rendered via Railway (0 failures, 1.73GB, ~59 min). Caption issues identified for fixing: (1) Word timing off - words appear on screen before being spoken, especially after commas/periods where there's a natural pause. The Caption component groups 3 words in a sliding window but doesn't respect pause boundaries. (2) Active word highlight (yellow + scale to 1.1 + fontSize 72 vs 64) eliminates visual spacing between words, making 3 words look like one blob. (3) Only one caption style exists - need variety matching popular YouTube/TikTok/Reels styles: single word pop, 2-3 word karaoke highlight, bottom-third subtitle bar, Mr Beast bold centered, etc. Caption style field already exists in manifest (visual_plan.caption_style) but isn't used. (4) Scripts written from human perspective but characters are non-human - disarming mismatch. Future Gemini prompt fix, not worth regenerating existing batch. (5) Phase 7 YouTube publishing needs custom thumbnail support (API supports it via thumbnails.set) and best practices for titles/metadata/hashtags. Caption fixes are Remotion-only changes - no ElevenLabs or Hedra cost. Just re-render with --force flag (~59 min, ~$0).

UFO Pipeline Phase 4: All 34 Hedra avatar clips generated with new 9:16 portraits (768x1344). Batches: a1 (13, prior run), a2 (8, 1 retry), b1 (7 character + 2 faceless skipped), b2 (6, 4 retries for network errors). All clips use rotated portraits (hedra_01/02/03.png per character via short_id hash). Combined with 233 Flux stills and 25 Hailuo clips, all visual generation is now complete (34 Hedra + 233 Flux + 25 Hailuo = 292 assets). Remaining: upload to R2, crop 22 old wrong-dimension clips in Remotion. Hedra "fetch failed" errors are transient, individual retries always succeed. ~15 Hedra credits used this session.

UFO Pipeline Phase 4 progress: All 233 Flux stills generated across 43 manifests (4 batches), 0 failures, ~41 min total via fal.ai. Added skip-if-exists check to generate-flux-still.mjs so existing stills aren't re-generated. 262 flux PNGs on disk (233 current + 29 orphaned from pre-enrichment shot indices). Hailuo clips also complete (25/25). Hedra avatar clips still in progress (parallel chat). Next: finish Hedra clips, upload all visual assets to R2, then Phase 5 (Remotion).

UFO Pipeline - Session 10 handoff (2026-03-18): Dave completed all outfit variants for Reptilian, Mantis, and Little Green Man via Gemini + Krea. All Phase 1 blockers are now cleared. Cloudflare R2 credentials partially filled in .env (R2_ACCOUNT_ID and R2_ACCESS_KEY_ID present, R2_SECRET_ACCESS_KEY and R2_ENDPOINT still needed from Dave). Next session tasks in order: (1) Dave fills remaining R2 creds in .env, (2) create R2 bucket via API or Cloudflare dashboard, (3) train custom LoRAs for Reptilian/Mantis/LGM on fal.ai (Dave reviews training image sets, same workflow as Grey: Krea base + outfit variants, ~$2 each), (4) populate Railway Postgres character registry (characters + character_outfits tables), (5) upload all portraits to R2. Then Phase 2: Gemini prompt chain development. Railway MCP available as deferred tools. Railway project ID: e86c55d4-f102-4c34-8b13-c21be10d5e4d.

People: Dave

UFO Pipeline - Phase 1 status as of 2026-03-18: Voices DONE for all 4 characters. Outfit variants DONE for Grey only (MIB, Lab, Priest, Pope via Gemini Gems). Dave is now generating outfit variants for Reptilian, Mantis, and Little Green Man manually via Gemini + Krea. Once outfits are done, we train custom LoRAs for those 3 characters on fal.ai (Grey LoRA already trained). R2 bucket creation and Railway Postgres character registry population are deferred until outfits and portraits are ready. Cloudflare R2 API credentials needed from Dave when the time comes.

People: Dave

UFO Pipeline - ElevenLabs API billing: uses the same Creator plan credit pool as the web UI. Not a separate charge. ~1,000 credits per minute of generated audio. Creator plan = 100,000 credits/month = ~100 minutes of voiceover. Plenty for 20+ shorts/week (each 30-60s of spoken audio). TTS model: eleven_multilingual_v2.

3/18/2026

UFO Pipeline - Hedra locked as production lip sync tool. Session 8-9 A/B tested Hedra Character-3 vs OmniHuman 1.5 (via fal.ai) vs VEED Fabric 1.0 (via fal.ai). Hedra won on 2D art style preservation, which is the critical metric. Cost breakdown: Hedra Character-3 uses 3 credits/sec at 540p, 6 credits/sec at 720p. Basic tier ($15/mo) = ~2,000 credits, Creator tier ($30/mo) = ~4,000 credits, Professional tier ($60/mo) = ~12,000 credits. Creator at $30/mo is the sweet spot for 20+ shorts/week. OmniHuman and Fabric are viable fallbacks if Hedra becomes unavailable.

UFO Pipeline Session 6 Summary (2026-03-18): Phase 1.5 Grey proof of concept, major progress. LORA TRAINING: Trained Grey custom LoRA on fal.ai ($1.76, 58 training images from Krea base poses + Gemini outfit variants). Trigger word: grey_kael. Generated 4 test scenes (desert-landing, mib-alley, lab-examination, pope-ceremony) using Grey LoRA + HRDFLS style LoRA stack. Character identity and 2D art style hold across all scenes and outfits. HAILUO VIDEO A/B TEST: Tested two approaches via fal.ai. (A) Image-to-video using Flux-generated stills as first frame: works well, minor face/body morphing on camera pans but acceptable for b-roll. (B) Subject reference with portrait: fails completely with 2D illustrated characters ("Unprocessable Entity"). Pipeline decision locked: all Hailuo usage will be image-to-video with Flux-composed first frames. Best for slow camera moves and atmospheric shots where character is mostly stationary. Complex body motion (walking, turning) causes morphing. ELEVENLABS VOICE: Set up API (Creator plan). Designed Grey's voice via Voice Design v3 ("Soft, quiet male voice with an otherworldly calm. Slightly breathy, unnervingly gentle."). Generated 9 previews, Dave picked winner. Voice saved as "Grey (Kael)", permanent voice ID: mTnkD8SvErH27JUwwM1J. Test voiceover clip generated at characters/voices/grey_test_voiceover.mp3. HEDRA BLOCKED: Basic tier ($15/mo) doesn't include API access, needs Creator tier ($30/mo) minimum. Web app also returning "failed to fetch" errors (service issue). hedra-node SDK v0.1.2 is outdated, points to deprecated mercury.dev.dream-ai.com instead of production api.hedra.com/web-app/public. SDK env var expects X_API_KEY not HEDRA_API_KEY. Hedra test deferred until service stabilizes and tier is upgraded. KEY DECISIONS: Hailuo subject reference is dead for this project (2D art incompatible). Visual pipeline structure: Flux generates composed stills with LoRA (character identity baked in), Hailuo animates them (subtle motion only), Hedra handles talking head lip-sync (pending validation). Dave swapped grey_default_front.png portrait to a cleaner version for Hedra input. REMAINING PHASE 1.5: Hedra talking head test, sample script, Whisper timestamps, b-roll generation, Remotion assembly.

People: Dave