Thoughts

4 thoughts of type "observation" about "UFO Pipeline" in the last 30 days

UFO Pipeline - Phase 1 status as of 2026-03-18: Voices DONE for all 4 characters. Outfit variants DONE for Grey only (MIB, Lab, Priest, Pope via Gemini Gems). Dave is now generating outfit variants for Reptilian, Mantis, and Little Green Man manually via Gemini + Krea. Once outfits are done, we train custom LoRAs for those 3 characters on fal.ai (Grey LoRA already trained). R2 bucket creation and Railway Postgres character registry population are deferred until outfits and portraits are ready. Cloudflare R2 API credentials needed from Dave when the time comes.

People: Dave

UFO Pipeline - ElevenLabs API billing: uses the same Creator plan credit pool as the web UI. Not a separate charge. ~1,000 credits per minute of generated audio. Creator plan = 100,000 credits/month = ~100 minutes of voiceover. Plenty for 20+ shorts/week (each 30-60s of spoken audio). TTS model: eleven_multilingual_v2.

3/18/2026

UFO Pipeline - Hedra locked as production lip sync tool. Session 8-9 A/B tested Hedra Character-3 vs OmniHuman 1.5 (via fal.ai) vs VEED Fabric 1.0 (via fal.ai). Hedra won on 2D art style preservation, which is the critical metric. Cost breakdown: Hedra Character-3 uses 3 credits/sec at 540p, 6 credits/sec at 720p. Basic tier ($15/mo) = ~2,000 credits, Creator tier ($30/mo) = ~4,000 credits, Professional tier ($60/mo) = ~12,000 credits. Creator at $30/mo is the sweet spot for 20+ shorts/week. OmniHuman and Fabric are viable fallbacks if Hedra becomes unavailable.

UFO Pipeline Session 6 Summary (2026-03-18): Phase 1.5 Grey proof of concept, major progress. LORA TRAINING: Trained Grey custom LoRA on fal.ai ($1.76, 58 training images from Krea base poses + Gemini outfit variants). Trigger word: grey_kael. Generated 4 test scenes (desert-landing, mib-alley, lab-examination, pope-ceremony) using Grey LoRA + HRDFLS style LoRA stack. Character identity and 2D art style hold across all scenes and outfits. HAILUO VIDEO A/B TEST: Tested two approaches via fal.ai. (A) Image-to-video using Flux-generated stills as first frame: works well, minor face/body morphing on camera pans but acceptable for b-roll. (B) Subject reference with portrait: fails completely with 2D illustrated characters ("Unprocessable Entity"). Pipeline decision locked: all Hailuo usage will be image-to-video with Flux-composed first frames. Best for slow camera moves and atmospheric shots where character is mostly stationary. Complex body motion (walking, turning) causes morphing. ELEVENLABS VOICE: Set up API (Creator plan). Designed Grey's voice via Voice Design v3 ("Soft, quiet male voice with an otherworldly calm. Slightly breathy, unnervingly gentle."). Generated 9 previews, Dave picked winner. Voice saved as "Grey (Kael)", permanent voice ID: mTnkD8SvErH27JUwwM1J. Test voiceover clip generated at characters/voices/grey_test_voiceover.mp3. HEDRA BLOCKED: Basic tier ($15/mo) doesn't include API access, needs Creator tier ($30/mo) minimum. Web app also returning "failed to fetch" errors (service issue). hedra-node SDK v0.1.2 is outdated, points to deprecated mercury.dev.dream-ai.com instead of production api.hedra.com/web-app/public. SDK env var expects X_API_KEY not HEDRA_API_KEY. Hedra test deferred until service stabilizes and tier is upgraded. KEY DECISIONS: Hailuo subject reference is dead for this project (2D art incompatible). Visual pipeline structure: Flux generates composed stills with LoRA (character identity baked in), Hailuo animates them (subtle motion only), Hedra handles talking head lip-sync (pending validation). Dave swapped grey_default_front.png portrait to a cleaner version for Hedra input. REMAINING PHASE 1.5: Hedra talking head test, sample script, Whisper timestamps, b-roll generation, Remotion assembly.

People: Dave