# Lunova Player — Customer Test v1 Production Notes

**Source:** `/n/apps/Apprecord.mp4` (10:00, 355 MB)
**Output:** `/n/apps/outgoing/lunova-test-customer/lunova-revoiced-v1.mp4` (6:14, 35 MB)
**Date:** 2026-05-01

---

## Pipeline that ran

1. **ffmpeg audio extract** — 16kHz mono WAV (3-10s)
2. **ffmpeg keyframe sampling** — 12 frames evenly across 10 min (instant)
3. **AIML Gemma 4 31B vision analysis** (24s, $0.0009)
   - Identified app: Lunova Player (login, profile, settings, home, live TV/sports, movies)
   - Identified crash frames: 0s (OBS setup), 300s (loading hang), 350s (crash dialog), 400s (blank), 550s (loading)
   - Recommended cuts: 0-50s, 300-420s, 540-600s
4. **Whisper large-v3 transcription on RTX 4090** (~60s on first run, includes 2.88GB model download)
   - 470-word transcript
   - Mistranscribed brand as "Lenovo" throughout
   - Hallucinated Spanish/Vietnamese gibberish at end (during crashed loading section)
   - Captured profanity from frustration moments
5. **AIML Gemma 4 31B script polish** (10s, $0.0007)
   - Fixed brand: "Lunova Player" everywhere
   - Removed all profanity
   - Removed Whisper artifacts (repetitive loops)
   - Reorganized into logical demo flow
   - Constrained to only-visible features (no Movies walkthrough since that section was cut for crashes)
6. **W-TTS-PREP normalization** (24s, $0.0007)
   - Brand context: Lunova Player pronounced "Loo-NOH-vah Player"
7. **VoxCPM-2 voiceover render** (3 chunks sequential, ~25 min total)
   - Mark's voice (mark base sample.wav reference + cached transcript)
   - Chunks: 44.5s + 50.1s + 55.7s = 150.2s total VO
8. **HyperFrames intro card** (~80s render) — "Lunova Player / Streaming Reimagined" wordmark reveal
9. **HyperFrames outro card** (~80s render) — closing logo + "Stream the World" tagline
10. **ffmpeg cuts** — segment-A (50-300s, 250s) + segment-B (420-540s, 120s)
11. **ffmpeg final compose** — intro + seg-A + seg-B + outro with 0.5s fade transitions, color grade (saturation 0.95, contrast 1.05), VO mux at t=3s offset

## Decisions made

- **Tier:** 2 (audio polish + visual chrome + cut crashes + intro/outro cards)
- **Voice:** Mark's stock persona (operator's voice)
- **Polish vs verbatim:** polished (transcript was rough with profanity + crashes)
- **Brand pronunciation:** "Lunova Player" (W-TTS-PREP enforced phonetic)
- **Color grade:** subtle (saturation 0.95, contrast 1.05) — kept screen recording natural
- **No music bed in v1** — ComfyUI restart pending to activate InspireMusic + StableAudioSampler nodes
- **No word-level captions** — would require W-WHISPERX wire build for kinetic emphasis
- **Aspect ratio:** kept 1920x1080 horizontal (matches source — could be vertical-cropped for social if desired)
- **Cut decisions:** trusted Gemma's first-pass identification of crash zones; could do finer-grained pass with more keyframes if needed

## Known v1 limitations (per substrate-honest assessment)

1. **VO-to-visual sync may drift in places.** VO is 150s of continuous narration; video is 374s with natural slow menu navigation. The VO describes features in a logical order; the video shows them at the recorder's pace. Could feel slightly "ahead" or "behind" at moments where a specific click is described.
2. **Tone shift from tutorial → marketing.** Polished script is more "discover the ultimate streaming experience" than "let me show you how to use this." If customer wanted a strict tutorial tone, the script can be re-prompted.
3. **220s of "silent visual" remaining after VO ends.** Pure screen-recording playback after the 150s VO completes. Professional touch (lets visuals breathe) but if the customer wants more narration coverage, we can either lengthen the script or chunk + place strategically across both segments.
4. **No word-level kinetic captions** — segment-level captions could be added; word-level requires W-WHISPERX wire.

## Customer-iteration paths (if v1 needs revision)

- **Revoice with different tone:** re-prompt AIML 31B with "tutorial walkthrough tone, less marketing" → re-render via VoxCPM (~25 min for new chunks).
- **Tighter VO/visual sync:** use Whisper segment timestamps (saved at `transcript-segments.json`) to align polished VO chunks to specific visual moments. Adds ~30 min.
- **Add music bed:** post ComfyUI restart, ACE-Step generates brand-mood-matched music. Adds ~5-10 min.
- **Multi-format export (vertical 9:16, square 1:1):** ~3-5 min ffmpeg pass, no re-render needed.
- **Cut additional crash sub-segments:** if reviewer spots more crashes inside kept segments, finer-grained Gemma pass + ffmpeg surgical cuts.

## Cost summary

- AIML 31B vision: $0.0009
- AIML 31B script polish: $0.0007
- AIML 31B HF intro author: ~$0.0007
- AIML 31B HF outro author: ~$0.0005
- AIML 31B W-TTS-PREP: $0.0007
- **Total cloud spend: ~$0.0035** (under one cent)
- Local compute: Whisper transcribe + Gemma vision (Gemma was cloud) + 3× VoxCPM renders + 2× HF renders — electricity only

## Time summary

- Stage 1 equivalent (transcribe + vision + correction prep): ~5 min wall-clock
- Stage 2 equivalent (polish + render + compose): ~30 min wall-clock (dominated by VoxCPM 25 min)
- **Total: ~35 min** for a 10-min source video — within the 12-20 min projection for production-grade Tier 2 once VoxCPM render times stabilize (today was a longer-than-typical 25 min for 3 chunks, may have been first-call cold-load + queue contention).

## Files saved

- Source: `/n/apps/Apprecord.mp4`
- Output: `/n/apps/outgoing/lunova-test-customer/lunova-revoiced-v1.mp4`
- Working dir: `C:\Users\marka\AppData\Local\Temp\lunova-revoice\` (transcript, vision analysis, scripts, chunks, segment cuts)