feat: add asset preferences, video research, and Remotion ad assets
- Add thumbs-down feedback modal and preference API endpoint - Add AI UGC video platforms research doc - Add ReflectAd Remotion composition with public flow assets - Add gemini-ad-designer and poster-ad-designer pipeline skills - Add research_reflect_v1.1 pipeline script Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,762 @@
|
||||
# AI UGC Video Generation Platforms Research 2025-2026
|
||||
## Realistic "Person Using Phone" Lifestyle Video Analysis
|
||||
|
||||
**Research Date**: March 2026
|
||||
**Focus**: Platforms for realistic video clips of people naturally interacting with phones/tablets (NOT talking-head testimonials)
|
||||
|
||||
---
|
||||
|
||||
## EXECUTIVE SUMMARY
|
||||
|
||||
For your specific use case—realistic lifestyle videos of people naturally using apps on phones (checking mood apps, couples looking at screens, tapping before bed, showing phones to family)—**the landscape is fragmented**:
|
||||
|
||||
- **Text-to-video models** (Runway, Kling, Google Veo, Sora) can generate general "person using phone" scenarios from text prompts but require careful prompt engineering
|
||||
- **Avatar platforms** (HeyGen, Synthesia, D-ID) excel at talking-head presenters, NOT lifestyle interaction videos
|
||||
- **Specialized UGC platforms** (MakeUGC, Creatify, Arcads) can make realistic people holding products but have limited "phone interaction" capabilities
|
||||
- **Phone mockup tools** (Mockey, Rotato, FlexClip) handle app screen display but lack realistic human actors
|
||||
|
||||
**Best Match for Your Use Case**: A combination approach using Runway Gen-4.5 or Google Veo 3.1 for lifestyle generation + a phone mockup tool for screen display integration.
|
||||
|
||||
---
|
||||
|
||||
## DETAILED PLATFORM ANALYSIS
|
||||
|
||||
### 1. RUNWAY GEN-4 / GEN-4.5
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐⭐⭐ (Excellent)
|
||||
**API Access**: ⭐⭐⭐⭐⭐ (Yes, fully supported)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (Via detailed prompts)
|
||||
**Overall Fit**: ⭐⭐⭐⭐⭐ (BEST OPTION for general "person using phone" videos)
|
||||
|
||||
**What It Does Well**:
|
||||
- **Character & Scene Consistency**: Gen-4 maintains consistent characters across multiple shots
|
||||
- **Physics Simulation**: Realistic weight, momentum, motion—crucial for natural phone interactions
|
||||
- **Camera Control**: Advanced camera movements (zoom, arc, trucking)
|
||||
- **Gen-4.5 Performance**: Released December 2025, now #1 on Artificial Analysis Text-to-Video benchmark with 1,247 Elo points
|
||||
|
||||
**Can It Do Your Use Cases?**
|
||||
- ✅ Person checking phone at breakfast and smiling
|
||||
- ✅ Couple looking at phone together on couch (with proper prompting)
|
||||
- ✅ Someone tapping phone quickly before bed
|
||||
- ✅ Parent showing teen something on phone
|
||||
|
||||
**API Details**:
|
||||
- Native API with modern documentation
|
||||
- Generation speed: 5-8 second videos in ~60 seconds (5x faster than Gen-4)
|
||||
- Supports text-to-video and image-to-video
|
||||
- Available via Runway's official API
|
||||
|
||||
**Pricing**:
|
||||
- No official per-video pricing published
|
||||
- Credit-based system through third-party APIs (CometAPI, AIML API, etc.)
|
||||
- Estimated: $0.25-$0.50 per 8-second video through aggregator APIs
|
||||
- Enterprise/volume discounts available
|
||||
|
||||
**Node.js/TypeScript Integration**:
|
||||
- Native Node.js SDK available: `npm install @runwayml/sdk`
|
||||
- REST API with standard authentication
|
||||
- Can be integrated into automated pipelines
|
||||
|
||||
**Quality**: Extremely high—bleeding-edge photorealism, best for lifestyle sequences
|
||||
|
||||
---
|
||||
|
||||
### 2. GOOGLE VEO 3 / VEO 3.1
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐⭐⭐ (Excellent)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, via Gemini API)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (Better with reference images)
|
||||
**Overall Fit**: ⭐⭐⭐⭐⭐ (EXCELLENT, comparable to Runway)
|
||||
|
||||
**What It Does Well**:
|
||||
- **Native Audio Generation**: Generates synchronized audio alongside video
|
||||
- **Human Face Generation**: Veo 3.1 can generate realistic human faces when provided references (advantage over Sora)
|
||||
- **Image-to-Video**: Enhanced capabilities for maintaining character consistency
|
||||
- **October 2025 Release**: Latest production model with high-fidelity outputs
|
||||
|
||||
**Can It Do Your Use Cases?**
|
||||
- ✅ All four use cases similar to Runway, with added audio sync
|
||||
- ✅ Better for complex scenes with multiple people (family showing scenarios)
|
||||
|
||||
**API Details**:
|
||||
- Available via Gemini API (Google's unified API)
|
||||
- Pricing available on Vertex AI platform
|
||||
- Can integrate with Google Cloud Platform workflows
|
||||
|
||||
**Pricing**:
|
||||
- Vertex AI: $0.40 per second (standard), $0.15 per second (faster model)
|
||||
- For 30-second video: ~$12 (standard) or ~$4.50 (faster)
|
||||
- Gemini API: Different pricing tier (check latest)
|
||||
- Free preview tier available for experimentation
|
||||
|
||||
**Node.js/TypeScript Integration**:
|
||||
- Google Cloud Node.js client libraries available
|
||||
- Standard REST API access
|
||||
- Integrates with existing GCP infrastructure
|
||||
|
||||
**Quality**: Very high, with better audio sync than Runway. Strong for family/couple scenarios.
|
||||
|
||||
---
|
||||
|
||||
### 3. SORA (OPENAI)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐⭐ (Very Good)
|
||||
**API Access**: ⭐⭐ (NOT AVAILABLE - major limitation)
|
||||
**Diverse Cast**: ⭐⭐⭐ (Possible with prompts)
|
||||
**Overall Fit**: ⭐⭐ (NOT SUITABLE for automation)
|
||||
|
||||
**Status**:
|
||||
- Sora 2 released September 2025
|
||||
- **No public API** as of January 2026
|
||||
- WaveSpeedAI offers unofficial Sora 2 API access (not directly supported by OpenAI)
|
||||
- January 2026 change: Free users can no longer generate—Plus ($20/mo) and Pro ($200/mo) only
|
||||
|
||||
**Capabilities**:
|
||||
- Can generate professional-quality videos up to 25 seconds with synchronized dialogue
|
||||
- More "physically accurate and realistic" than earlier models
|
||||
- Can handle complex human interactions
|
||||
|
||||
**Why Not Suitable**:
|
||||
- No direct API access from OpenAI
|
||||
- Relies on web app or unofficial third-party APIs
|
||||
- Can't be directly integrated into automated pipelines
|
||||
- Subscription-locked (no free tier)
|
||||
|
||||
**Recommendation**: Skip for your automation needs.
|
||||
|
||||
---
|
||||
|
||||
### 4. KLING AI 3.0
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐⭐ (Very Good)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, via multiple providers)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (Strong)
|
||||
**Overall Fit**: ⭐⭐⭐⭐ (GOOD alternative to Runway/Veo)
|
||||
|
||||
**What It Does Well**:
|
||||
- **Physics Accuracy**: Simulates gravity, balance, inertia for believable movement
|
||||
- **Face Stability**: Characters remain consistent across frames (February 2026 launch solved this major pain point)
|
||||
- **Element Library**: Upload reference images to ensure characters stay consistent across shots
|
||||
- **Audio Sync**: Native audio with video for up to 5 minutes
|
||||
|
||||
**Can It Do Your Use Cases?**
|
||||
- ✅ Person checking phone at breakfast
|
||||
- ✅ Couple looking at phone together
|
||||
- ✅ Tapping phone before bed
|
||||
- ✅ Parent/teen scenarios (with reference images for consistency)
|
||||
|
||||
**Kling 3.0 Specifics** (Unified multimodal video engine):
|
||||
- Cinema-grade visuals
|
||||
- Physics-accurate motion
|
||||
- Native audio sync
|
||||
- Released February 2026
|
||||
|
||||
**API Access**:
|
||||
- Multiple third-party providers: fal.ai, Runware, WaveSpeedAI, PiAPI
|
||||
- Element Library feature available for character consistency
|
||||
- Supports text-to-video and image-to-video
|
||||
|
||||
**Pricing**:
|
||||
- Variable by provider, but generally affordable (cheaper than Runway/Veo)
|
||||
- fal.ai: Pay-per-use model (check current rates)
|
||||
- Estimated: $0.10-$0.30 per video through aggregators
|
||||
|
||||
**Node.js/TypeScript Integration**:
|
||||
- Available through fal.ai SDK (`npm install @fal-ai/client`)
|
||||
- REST API through aggregator platforms
|
||||
- Straightforward integration
|
||||
|
||||
**Quality**: Very high, especially after 3.0 launch. Excellent value for cost.
|
||||
|
||||
---
|
||||
|
||||
### 5. HEYGEN (Avatar-Based)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐ (Limited)
|
||||
**API Access**: ⭐⭐⭐⭐⭐ (Excellent)
|
||||
**Diverse Cast**: ⭐⭐⭐ (100+ avatars available)
|
||||
**Overall Fit**: ⭐⭐ (NOT IDEAL - focused on talking heads)
|
||||
|
||||
**Problem**: HeyGen specializes in **avatar presenters speaking to camera**, NOT lifestyle interactions.
|
||||
|
||||
**Latest Features (February 2026)**:
|
||||
- Avatar IV with motion-captured avatars
|
||||
- Timing-aware hand gestures
|
||||
- Micro-expressions (natural blinks, subtle smiles)
|
||||
- Redesigned homepage
|
||||
- ChatGPT integration
|
||||
- Video Agent API (new)
|
||||
|
||||
**Avatar IV Performance**:
|
||||
- Full-body avatars with realistic lip-sync
|
||||
- Hand gesture timing
|
||||
- Micro-expressions
|
||||
- Digital Twin feature (create version of yourself)
|
||||
|
||||
**When It Might Work**:
|
||||
- Could potentially show avatar using phone in script, but very artificial
|
||||
- Better for product explainers where avatar talks about the app
|
||||
|
||||
**API Details**:
|
||||
- Video Agent API: prompt-to-video workflows
|
||||
- REST API with Node.js support
|
||||
- Multiple video generation, translation, LiveAvatar streaming endpoints
|
||||
|
||||
**Pricing**:
|
||||
- API starts at $99/month
|
||||
- Credit-based: 1 credit = 1 minute avatar video (standard)
|
||||
- Avatar IV uses 1 credit per 10 seconds (~6 credits/minute)
|
||||
- Video Agent: ~2 credits per minute
|
||||
- Translation: 3 credits per minute of source video
|
||||
- Pro tier: $0.99/credit, Scale tier: $0.50/credit
|
||||
|
||||
**Recommendation**: Use only if you want talking-head explainer videos about the app, NOT lifestyle interaction videos.
|
||||
|
||||
---
|
||||
|
||||
### 6. SYNTHESIA (Avatar-Based)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐ (Limited)
|
||||
**API Access**: ⭐⭐⭐ (Yes, Creator plan+)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (160+ avatars, real actors)
|
||||
**Overall Fit**: ⭐⭐ (NOT IDEAL - talking head focused)
|
||||
|
||||
**What It Does**:
|
||||
- Express-2 engine: full-body avatars with gestures, pointing, waving
|
||||
- All avatars based on real actors (paid consent model)
|
||||
- Facial micro-expressions matching emotional tone
|
||||
- 160+ languages supported
|
||||
|
||||
**API Access**:
|
||||
- Creator plan: $64/month (billed yearly, $18/month equivalent)
|
||||
- Includes API access with rate limits
|
||||
- Webhook integration for automated workflows
|
||||
|
||||
**Pricing**:
|
||||
- **Free**: 36 minutes/year
|
||||
- **Starter**: $18/month (annual) = ~0.33 credits/minute
|
||||
- **Creator**: $64/month (annual) - includes API
|
||||
- **Enterprise**: Custom pricing
|
||||
- Credit system: 1 minute = 1 credit
|
||||
|
||||
**Node.js/TypeScript Integration**:
|
||||
- REST API with Node.js support
|
||||
- Webhook integration for async workflows
|
||||
- Standard authentication
|
||||
|
||||
**Why Not Ideal**:
|
||||
- Designed for presenters/training videos, not lifestyle interaction
|
||||
- Avatars still feel "presenter-like" rather than casual interaction
|
||||
- Better for corporate than authentic UGC
|
||||
|
||||
**Recommendation**: Skip for this use case.
|
||||
|
||||
---
|
||||
|
||||
### 7. PIKA LABS 2.2
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐⭐ (Good)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, via fal.ai)
|
||||
**Diverse Cast**: ⭐⭐⭐ (Text-prompt based)
|
||||
**Overall Fit**: ⭐⭐⭐ (Decent alternative)
|
||||
|
||||
**What It Does**:
|
||||
- Text-to-video generation (Pika 2.2)
|
||||
- Image-to-video (Pikascenes 2.2)
|
||||
- Pikaframes 2.2: upload 5 keyframes, AI interpolates smooth motion
|
||||
- Pikaformance: hyper-real expressions synced to audio (near real-time)
|
||||
|
||||
**API Access**:
|
||||
- December 2025 announcement: Pika 2.2 now exposed via fal.ai
|
||||
- API key through fal dashboard
|
||||
- Text-to-video and image-to-video endpoints
|
||||
|
||||
**Use Cases**:
|
||||
- ✅ Can generate "person using phone" via text prompts
|
||||
- ✅ Pikaframes could help create consistent character across shots
|
||||
- Less ideal than Runway/Veo for this specific use case
|
||||
|
||||
**Pricing**: Not clearly published; likely variable through fal.ai aggregator
|
||||
|
||||
**Quality**: Good, but less consistent character realism than Runway Gen-4.5
|
||||
|
||||
---
|
||||
|
||||
### 8. D-ID (Real-Time Avatar Video)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐ (Limited)
|
||||
**API Access**: ⭐⭐⭐⭐⭐ (Excellent - core product)
|
||||
**Diverse Cast**: ⭐⭐ (Limited to avatar variations)
|
||||
**Overall Fit**: ⭐⭐ (NOT suitable)
|
||||
|
||||
**New V4 Expressive Visual Agents (March 2026)**:
|
||||
- Ultra-high-fidelity digital humans
|
||||
- Real-time LLM-connected conversations
|
||||
- Sub-0.5-second latency
|
||||
- Up to 4K resolution
|
||||
- Sentiment-aligned facial expressions
|
||||
- Trained on real actor performances
|
||||
|
||||
**Best Use Case**:
|
||||
- Customer support chatbots with realistic avatars
|
||||
- Interactive training experiences
|
||||
- NOT lifestyle video content
|
||||
|
||||
**Why Not Suitable**:
|
||||
- Designed for talking-head interactions
|
||||
- Real-time conversational focus
|
||||
- Not for pre-recorded lifestyle scenarios
|
||||
|
||||
**Recommendation**: Skip for your use case.
|
||||
|
||||
---
|
||||
|
||||
### 9. TAVUS (Real-Time AI Humans)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐ (Moderate)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, with real-time capability)
|
||||
**Diverse Cast**: ⭐⭐ (Requires custom avatar creation)
|
||||
**Overall Fit**: ⭐⭐⭐ (Possible but expensive)
|
||||
|
||||
**What It Does**:
|
||||
- Creates hyperrealistic AI replicas from 2-minute video sample
|
||||
- Phoenix-4 model: first real-time model with emotional states + active listening
|
||||
- Emotional states, facial expressions, head movements as unified system
|
||||
- Millisecond-level latency
|
||||
|
||||
**Pricing**:
|
||||
- Free plan: 25 min/month conversational, 5 min/month generation ($0 cost)
|
||||
- Starter: ~$39-59/month
|
||||
- Growth: 1,250 min/month conversational
|
||||
- Overage: $0.37/min conversations, $0.32/min overage (Growth tier)
|
||||
- Enterprise: Custom (resource-intensive, expensive)
|
||||
|
||||
**Use Cases**:
|
||||
- ✅ Could generate video of person using phone if you create custom avatar
|
||||
- ✅ Real-time interaction capability (not needed for your use case)
|
||||
- ❌ Expensive for batch video generation
|
||||
|
||||
**Why Less Ideal**:
|
||||
- Designed for real-time conversational avatars
|
||||
- Creating custom avatars is expensive
|
||||
- Better for interactive experiences than pre-recorded lifestyle videos
|
||||
|
||||
---
|
||||
|
||||
### 10. MAKEUGC (Specialized UGC Platform)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐ (Moderate)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, Platform API)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (100+ licensed AI avatars)
|
||||
**Overall Fit**: ⭐⭐⭐ (GOOD for avatar-based content)
|
||||
|
||||
**What It Does**:
|
||||
- 100+ unique licensed AI avatars
|
||||
- Avatar can realistically hold/showcase/consume products
|
||||
- Testimonial and lifestyle shot generation
|
||||
- Text script → AI avatar video transformation
|
||||
|
||||
**Key Feature**:
|
||||
- Proprietary hand-holding technology: avatars can realistically hold products
|
||||
- Could potentially adapt for "holding phone" scenarios
|
||||
|
||||
**API Details**:
|
||||
- Platform API for programmatic video generation
|
||||
- Authentication via API key
|
||||
- Specify avatar, voice, script
|
||||
- Processing time: 2-10 minutes for talking head videos
|
||||
- 29 languages supported
|
||||
|
||||
**Pricing**:
|
||||
- Under $10 per video (mentioned as cost comparison to $100-200 traditional UGC)
|
||||
- Subscription required (exact tiers unclear from search)
|
||||
|
||||
**Node.js/TypeScript Integration**:
|
||||
- REST API should be straightforward to integrate
|
||||
- Check documentation at app.makeugc.ai/api/platform/documentation
|
||||
|
||||
**Use Cases**:
|
||||
- ✅ Person holding phone showing it to others (good fit)
|
||||
- ✅ Product holding = could adapt for phone
|
||||
- ❌ More formal/structured than casual lifestyle
|
||||
- ❌ Feels more like testimonial than authentic interaction
|
||||
|
||||
**Quality**: Good for product-focused UGC, less natural for casual lifestyle scenarios
|
||||
|
||||
---
|
||||
|
||||
### 11. CREATIFY
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐⭐ (Moderate)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, Business plan+)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (1500+ hyper-realistic UGC avatars)
|
||||
**Overall Fit**: ⭐⭐⭐ (GOOD for avatar-based UGC)
|
||||
|
||||
**What It Does**:
|
||||
- 1500+ hyper-realistic UGC avatars
|
||||
- Aurora avatar model (state-of-the-art)
|
||||
- Text-to-video, URL-to-video, image-to-video
|
||||
- Custom templates, product videos, AI Shorts
|
||||
|
||||
**API Capabilities**:
|
||||
- URL-to-video conversion
|
||||
- AI avatar lip-sync
|
||||
- Aurora image-to-video
|
||||
- Custom templates
|
||||
- Text-to-Speech
|
||||
|
||||
**Pricing**:
|
||||
- Free: 10 credits (≈2 videos)
|
||||
- Creator: $39/month (annual) or $33/month annual = 50 credits/month
|
||||
- Business: $99/month = 250 credits/month + API access + priority support
|
||||
- Enterprise: Custom with volume discounts
|
||||
- Credit cost: 2-20 per video depending on quality
|
||||
|
||||
**Estimated Cost**:
|
||||
- At Business tier: $99/250 credits = ~$0.40/credit
|
||||
- 10-credit video = ~$4, 20-credit video = ~$8
|
||||
|
||||
**Node.js/TypeScript Integration**:
|
||||
- REST API on Business plan
|
||||
- Check docs.creatify.ai for API details
|
||||
|
||||
**Use Cases**:
|
||||
- ✅ Person holding/showing phone
|
||||
- ✅ Family/couple scenarios with different avatars
|
||||
- ✅ Good diversity in avatar library
|
||||
- ❌ May feel more "production" than authentic UGC
|
||||
|
||||
---
|
||||
|
||||
### 12. ARCADS.AI (Specialized UGC)
|
||||
|
||||
**Phone Interaction Capability**: ⭐⭐ (Limited)
|
||||
**API Access**: ⭐⭐⭐⭐ (Yes, Enterprise+)
|
||||
**Diverse Cast**: ⭐⭐⭐⭐ (300+ actors from video footage)
|
||||
**Overall Fit**: ⭐⭐⭐ (Possible but not ideal)
|
||||
|
||||
**What It Does**:
|
||||
- 300+ AI "actors" from real video footage (better body language than synthetic)
|
||||
- TikTok-style UGC video ads
|
||||
- Avatars can hold products and show apps
|
||||
- B-rolls, music, captions, transitions auto-added
|
||||
|
||||
**Can They Do Phone?**
|
||||
- ✅ Can make avatar hold phone and show app
|
||||
- ❌ Struggles with physical products, likely limited for realistic phone interaction
|
||||
|
||||
**API Details**:
|
||||
- Enterprise plans include API access
|
||||
- Trigger generation from briefs
|
||||
- Auto-route to cloud storage
|
||||
|
||||
**Pricing**:
|
||||
- Starter: $110/month = 10 videos/month = $11/video
|
||||
- Creator: $220/month = 20 videos/month = $11/video
|
||||
- Custom plans for volume + API access
|
||||
|
||||
**Why Less Ideal**:
|
||||
- Platform struggles with physical product interactions
|
||||
- More TikTok-ad focused than lifestyle
|
||||
- Enterprise-only API (high minimum commitment)
|
||||
|
||||
---
|
||||
|
||||
## PHONE MOCKUP / APP SCREEN DISPLAY TOOLS
|
||||
|
||||
If you need to show actual phone screens, these complement AI video tools:
|
||||
|
||||
### Mockey.ai
|
||||
- Phone mockup video generator
|
||||
- Add your design, generate MP4 mockup
|
||||
- Templates with realistic person holding phone
|
||||
- Good for app screen display
|
||||
|
||||
### Rotato
|
||||
- 3D device mockups
|
||||
- Your own app/web designs on device screens
|
||||
- High-quality visuals
|
||||
|
||||
### FlexClip
|
||||
- Free phone mockup generator
|
||||
- Display app screenshots on iPhone/Android backgrounds
|
||||
- AI image tools (object remover, voice generator)
|
||||
- Integrated with video editor
|
||||
|
||||
### Placeit (by Envato)
|
||||
- App mockup templates
|
||||
- Animated device displays
|
||||
- Professional quality
|
||||
|
||||
**Strategy**: Use AI video generator for realistic people, combine with mockup tool for accurate phone screen display.
|
||||
|
||||
---
|
||||
|
||||
## RECOMMENDATION MATRIX
|
||||
|
||||
### For Your Specific Use Cases:
|
||||
|
||||
**Use Case: "Person checking phone at breakfast and smiling"**
|
||||
- **Best**: Runway Gen-4.5 with detailed prompt
|
||||
- **Alternative**: Google Veo 3.1
|
||||
- **Budget**: Kling AI 3.0
|
||||
|
||||
**Use Case: "Couple looking at phone together on couch"**
|
||||
- **Best**: Runway Gen-4.5 (multi-character consistency)
|
||||
- **Alternative**: Google Veo 3.1
|
||||
- **Budget**: Kling AI 3.0
|
||||
|
||||
**Use Case: "Someone tapping phone quickly before bed"**
|
||||
- **Best**: Runway Gen-4.5 (motion capture precision)
|
||||
- **Alternative**: Kling AI 3.0 (physics simulation)
|
||||
|
||||
**Use Case: "Parent showing teen something on phone"**
|
||||
- **Best**: Runway Gen-4.5 or Google Veo 3.1 (multi-person interaction)
|
||||
- **Alternative**: MakeUGC or Creatify (controlled avatar setup)
|
||||
|
||||
---
|
||||
|
||||
## IMPLEMENTATION ARCHITECTURE
|
||||
|
||||
### Option A: Text-to-Video Foundation (Recommended)
|
||||
|
||||
```typescript
|
||||
// Runway Gen-4.5 approach
|
||||
const prompt = `
|
||||
A woman sits at her kitchen table with breakfast,
|
||||
holding her phone. She glances at it, reads something
|
||||
that makes her smile. Natural morning lighting. Shot
|
||||
from medium distance, gentle camera movement.
|
||||
`;
|
||||
|
||||
// Generate via Runway API
|
||||
const video = await runwayClient.generateVideo({
|
||||
prompt,
|
||||
duration: 10,
|
||||
quality: 'high'
|
||||
});
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Single source of truth
|
||||
- High realism
|
||||
- Character consistency
|
||||
- Flexible scenarios
|
||||
|
||||
**Cons**:
|
||||
- Phone screen not visible
|
||||
- Prompt engineering required
|
||||
- May need multiple generations for variations
|
||||
|
||||
### Option B: Composite Approach
|
||||
|
||||
```typescript
|
||||
// Generate person using phone video
|
||||
const personVideo = await runwayClient.generateVideo({
|
||||
prompt: "Woman checking her phone at breakfast, smiling",
|
||||
duration: 10
|
||||
});
|
||||
|
||||
// Create phone mockup with your actual app UI
|
||||
const phoneVideo = await mockeyClient.generateMockup({
|
||||
appScreenshot: moodAppScreenshot,
|
||||
template: 'hand_holding_phone'
|
||||
});
|
||||
|
||||
// Composite them together (requires video editing)
|
||||
const final = compositeVideos(personVideo, phoneVideo);
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Shows actual app UI
|
||||
- Customizable
|
||||
- Control over phone screen content
|
||||
|
||||
**Cons**:
|
||||
- Requires video compositing
|
||||
- More complex pipeline
|
||||
- Phone screen doesn't match hand/phone position perfectly
|
||||
|
||||
### Option C: UGC Avatar Platform
|
||||
|
||||
```typescript
|
||||
// Creatify approach - controlled but less flexible
|
||||
const video = await creatifyClient.generateVideo({
|
||||
avatarId: 'avatar_diverse_female_30s',
|
||||
script: 'Let me show you our mood tracking app',
|
||||
voiceId: 'natural_female_voice',
|
||||
backgroundTemplate: 'modern_bedroom',
|
||||
productUrl: 'https://yourapp.com'
|
||||
});
|
||||
```
|
||||
|
||||
**Pros**:
|
||||
- Controlled, consistent output
|
||||
- Diverse avatars available
|
||||
- Quick generation
|
||||
|
||||
**Cons**:
|
||||
- Less natural/authentic
|
||||
- Limited "lifestyle" feel
|
||||
- Feels more like testimonial
|
||||
|
||||
---
|
||||
|
||||
## FINAL RECOMMENDATION FOR YOUR PIPELINE
|
||||
|
||||
### Best Solution: **Runway Gen-4.5 + Optional Compositing**
|
||||
|
||||
**Why**:
|
||||
1. **Highest Quality**: #1 on AI video benchmarks
|
||||
2. **API First**: Built for automation, excellent Node.js integration
|
||||
3. **Handles All Use Cases**: Can generate realistic multi-person interactions, natural gestures, emotional micro-expressions
|
||||
4. **Reasonable Pricing**: ~$0.25-$0.50 per 8-10 second video (through aggregators)
|
||||
5. **Character Consistency**: Maintains same person across shots and variations
|
||||
|
||||
**Integration Path**:
|
||||
|
||||
```typescript
|
||||
import Anthropic from "@anthropic-sdk/sdk";
|
||||
import Runway from "@runwayml/sdk";
|
||||
|
||||
const runway = new Runway({
|
||||
apiKey: process.env.RUNWAY_API_KEY
|
||||
});
|
||||
|
||||
async function generateMoodAppUGC(scenario: string) {
|
||||
const prompt = `
|
||||
Realistic, natural lighting. Shot composition appropriate for the scenario.
|
||||
${scenario}
|
||||
|
||||
Character: diverse, relatable person
|
||||
Style: authentic UGC, not staged/commercial
|
||||
Duration: 8-10 seconds
|
||||
`;
|
||||
|
||||
const video = await runway.generateVideo({
|
||||
prompt,
|
||||
duration: 10,
|
||||
aspectRatio: "9:16" // TikTok/Instagram vertical
|
||||
});
|
||||
|
||||
return video;
|
||||
}
|
||||
|
||||
// Generate variations
|
||||
const scenarios = [
|
||||
"Woman checking her phone at breakfast, sees notification, smiles",
|
||||
"Couple sitting on couch, passing phone back and forth, both smiling",
|
||||
"Teenager in bedroom, taps phone quickly before sleeping",
|
||||
"Parent showing child phone screen, both looking engaged"
|
||||
];
|
||||
|
||||
for (const scenario of scenarios) {
|
||||
const video = await generateMoodAppUGC(scenario);
|
||||
await saveVideo(video);
|
||||
}
|
||||
```
|
||||
|
||||
**Estimated Pipeline Costs**:
|
||||
- 4 videos × $0.35 average = $1.40
|
||||
- 100 videos/month = $35
|
||||
- 1,000 videos/month = $350 (scale pricing may apply)
|
||||
|
||||
### Secondary Option: **Google Veo 3.1**
|
||||
|
||||
If you prefer:
|
||||
- Native audio sync in videos
|
||||
- More conservative, "safe" generation
|
||||
- Integrated Google Cloud infrastructure
|
||||
- Reference image consistency for characters
|
||||
|
||||
**Cost**: $0.40/second standard = ~$4 per 10-second video
|
||||
|
||||
### Budget Option: **Kling AI 3.0**
|
||||
|
||||
If you're price-sensitive:
|
||||
- ~$0.10-$0.30 per video
|
||||
- Still excellent quality (especially Kling 3.0)
|
||||
- Good physics for natural gestures
|
||||
- Element Library for character consistency
|
||||
|
||||
---
|
||||
|
||||
## NODE.JS IMPLEMENTATION CHECKLIST
|
||||
|
||||
- [ ] Install Runway SDK or use their REST API
|
||||
- [ ] Set up authentication (API keys in environment)
|
||||
- [ ] Create prompt templates for each UGC scenario
|
||||
- [ ] Implement video generation with error handling/retries
|
||||
- [ ] Set up webhook/polling for async generation
|
||||
- [ ] Download and organize generated videos
|
||||
- [ ] (Optional) Integrate video compositing library for phone screen mockups
|
||||
- [ ] Create variation generator (prompt templates with parameters)
|
||||
- [ ] Implement quality/consistency checks
|
||||
- [ ] Log all API calls, costs, and video metadata
|
||||
|
||||
---
|
||||
|
||||
## PLATFORMS TO AVOID FOR THIS USE CASE
|
||||
|
||||
❌ **HeyGen**: Talking-head avatars, not lifestyle
|
||||
❌ **Synthesia**: Corporate/training videos, not authentic UGC
|
||||
❌ **D-ID**: Real-time chatbot avatars, not pre-recorded lifestyle
|
||||
❌ **Tavus**: Expensive for batch generation, conversation-focused
|
||||
❌ **Sora**: No public API, can't automate
|
||||
❌ **Pika**: Good but less consistent character than Runway/Veo
|
||||
|
||||
---
|
||||
|
||||
## KEY METRICS COMPARISON TABLE
|
||||
|
||||
| Platform | Phone Interaction | API | Diverse Cast | API Cost/Video | Quality | Ease of Integration |
|
||||
|----------|-------------------|-----|--------------|-----------------|---------|---------------------|
|
||||
| **Runway Gen-4.5** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $0.25-$0.50 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
|
||||
| **Google Veo 3.1** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $0.40/sec | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
|
||||
| **Kling AI 3.0** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $0.10-$0.30 | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
|
||||
| **Creatify** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $0.40-$8.00 | ⭐⭐⭐ | ⭐⭐⭐ |
|
||||
| **MakeUGC** | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | <$10 | ⭐⭐⭐ | ⭐⭐⭐ |
|
||||
| **Arcads** | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $11 | ⭐⭐⭐ | ⭐⭐⭐ |
|
||||
| **HeyGen** | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | $0.50-$0.99 | ⭐⭐⭐ | ⭐⭐⭐⭐ |
|
||||
| **Synthesia** | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | $0.33-1.00 | ⭐⭐⭐ | ⭐⭐⭐ |
|
||||
|
||||
---
|
||||
|
||||
## SOURCES
|
||||
|
||||
### Primary Research Sources
|
||||
- [Runway Gen-4 Research](https://runwayml.com/research/introducing-runway-gen-4)
|
||||
- [Runway API Documentation](https://runwayml.com/api)
|
||||
- [Google Veo 3.1 Announcement](https://developers.googleblog.com/introducing-veo-3-1-and-new-creative-capabilities-in-the-gemini-api/)
|
||||
- [Google Veo API Docs](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation)
|
||||
- [Kling AI 3.0 Launch](https://higgsfield.ai/kling-o1-intro)
|
||||
- [HeyGen API Pricing](https://www.heygen.com/api-pricing)
|
||||
- [HeyGen February 2026 Release](https://www.heygen.com/blog/heygen-february-2026-release)
|
||||
- [Synthesia API Docs](https://docs.synthesia.io/reference/introduction)
|
||||
- [Synthesia Pricing 2026](https://www.synthesia.io/pricing)
|
||||
- [D-ID V4 Announcement](https://www.d-id.com/news/v4-expressive-visual-agents-real-time-llm-connected-interaction/)
|
||||
- [MakeUGC Platform API](https://app.makeugc.ai/api/platform/documentation)
|
||||
- [Creatify API](https://creatify.ai/api)
|
||||
- [Tavus Pricing](https://www.tavus.io/pricing)
|
||||
- [Arcads AI Features](https://www.arcads.ai/features/)
|
||||
- [Pika API via fal.ai](https://blog.fal.ai/pika-api-is-now-powered-by-fal)
|
||||
- [AI Video Generation APIs 2025](https://www.tavus.io/post/high-quality-ai-video-api)
|
||||
- [Best AI Video Generators 2026](https://zapier.com/blog/best-ai-video-generator/)
|
||||
|
||||
---
|
||||
|
||||
## NEXT STEPS
|
||||
|
||||
1. **Sign up for Runway API** with test credits
|
||||
2. **Create prompt templates** for your 4 use cases
|
||||
3. **Test generation** with various prompts and durations
|
||||
4. **Measure quality** and iteration requirements
|
||||
5. **Calculate actual costs** from real API usage
|
||||
6. **Build Node.js pipeline** with error handling
|
||||
7. **Implement variation system** (prompt parameters, style options)
|
||||
8. **Monitor and optimize** prompts based on output quality
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: March 2026
|
||||
**Research Methodology**: Comprehensive web search of 2025-2026 platform releases, API documentation, and pricing structures.
|
||||
Reference in New Issue
Block a user