Vibe Video
Create Videos Like Writing Code

Language: English | Simplified Chinese
Vibe Video is a VS Code extension that lets you create videos like writing code: write scripts in Markdown, generate storyboards with AI, batch generate videos, and compose them with one click.

✨ Features
🚀 Lightweight Workflow
- Write scripts in Markdown (any format)
- Use Cursor AI to generate complete project structure with one click (subjects, scenes, storyboards, first frames)
- Standardized project structure, Git-friendly
- Fixed storyboard duration (5s/10s) for easy batch processing
🤖 Smart AI Integration
- Auto-generate
.cursorrules to help AI understand project structure and workflow
- Subject library management for consistent character appearance
- Scene library management for reusable scene resources
- Image-to-video generation (higher quality, more controllable)
- Multiple AI Provider support:
- Tongyi Wanxiang API (domestic service with excellent Chinese support, ✅ Tested, production ready)
- OpenAI Sora API (supports sora-2 video generation, gpt-image-1/dall-e-3 image generation) ✅ Tested (Tongyi Wanxiang recommended)
- Replicate API (supports multiple video generation models like Zeroscope, AnimateDiff, etc.) ⚠️ Not Tested
- Google Gemini API (supports gemini-3-pro-image-preview, veo-3 models) ⚠️ Not Tested
📊 Visual Management
- Sidebar displays project resources (subjects, scenes, storyboards, first frames, videos)
- Quality checks and friendly suggestions
- Project statistics and progress tracking
- Right-click menu for quick single resource generation
🎬 Complete Workflow
Vibe Video's workflow is like writing code: Write → Generate → Review → Iterate.
📝 Write Script (AI-assisted)
↓
🤖 Generate Project Structure (Subjects/Scenes/Storyboards/First Frames)
↓ ↖ Review/Iterate
🖼️ Generate Image Resources (Subjects/Scenes/First Frames)
↓ ↖ Review/Iterate
🎬 Generate Video Clips
↓ ↖ Review/Iterate
🎞️ Compose Final Video
↓
✅ Complete
Core Philosophy: The entire process can be iterated repeatedly, just like coding. Generating resources is like "compiling", and manual review is like "finding bugs". If it doesn't pass, you can iterate multiple times until satisfied.
💡 Design Philosophy: Lightweight to the Core
Vibe Video adopts a "just right" balanced design with the core philosophy of lightweight to the core.
🎯 Three-Layer Design Principles
- Storyboard Script Generation: Use Cursor AI / Copilot (provide context via
.cursorrules)
- ✅ AI programming tools are already powerful, we don't need to reinvent the wheel
- ✅ The extension only provides context to help AI understand project structure
- Project Organization: Standardized file structure and naming conventions
- Resource Browsing: Simple sidebar view
2️⃣ Necessary Complexity (Unavoidable)
- Video Generation: Must call specialized video AI APIs
- ⚠️ Theoretically possible to let AI programming tools access video generation via MCP, but too expensive
- ✅ Keep it simple: Single Provider + basic polling
- ✅ Support image-to-video: Provide initial frame → higher quality, more controllable
3️⃣ Optional Features
- Video Composition: Use ffmpeg (optional, doesn't affect core workflow)
🌟 Core Concepts
Markdown > JSON
- Storyboard scripts use Markdown, not JSON
- Intuitive and easy to read
- AI naturally understands, no complex validation needed
- Git-friendly, easy collaboration
Content > Format
- Don't obsess over format details
- Focus on teaching users to write good prompts
- Validation is auxiliary, not mandatory
- Loose parsing, strong error tolerance
Convention Over Configuration
- Make projects "self-explanatory" through standardized file structure
- AI can automatically understand project intent
- Reduce configuration, improve efficiency
- Don't reinvent the wheel
- When users use Cursor, Copilot, Claude, etc., the extension provides context
- Extension is an assistant, not a controller
Practicality First
- Don't pursue perfect abstract design
- Focus on core user value (batch video generation)
- Fast iteration: 6 weeks instead of 3-4 months
- Friendly prompts, not strict errors
📊 Minimal Tech Stack
VS Code Extension
├── TypeScript 5.x
├── VS Code Extension API
├── Simple TreeView (resource browsing)
└── Optional: ffmpeg (video composition)
Configuration Files (for AI understanding)
├── .cursorrules / .clinerules
└── Standardized project structure
Production Dependencies:
- Tongyi Wanxiang API (optional, default, ✅ Tested, production ready)
- Replicate API (optional, ⚠️ Not Tested)
- Google Gemini API (optional, ⚠️ Not Tested)
🎓 Why This Design?
- ✅ No Over-Engineering: Avoid complexity of heavy solutions
- ✅ No Over-Simplification: Ensure core functionality is usable
- ✅ Fast Iteration: Complete usable version in 6 weeks
- ✅ User Control: Users can manually edit any file, skip any step
- ✅ Low Cost: Leverage users' existing AI tool subscriptions
Remember: This is a lightweight tool. The core is to help users "organize" and "standardize", not "automate everything". Keeping it simple and focused is key to success.
🚀 Quick Start
1. Initialize Project
Ctrl+Shift+P → "Vibe Video: Initialize Project"
Or click "Initialize Project" in the left Vibe Video resource tree
2. Write Script
Edit Script.md and write your video script
3. Use AI to Generate Complete Project Structure
In Cursor AI Chat, enter:
Generate project based on Script.md
AI will automatically execute the following steps:
- Extract Subjects: Extract main characters/subjects from the script, save to
subjects/ directory
- Extract Scenes: Extract various scenes from the script, save to
scenes/ directory
- Split Storyboards: Split scenes into 5s/10s storyboard units (Storyboard duration can only be 5s or 10s)
- Generate Storyboard Scripts: Write detailed scripts for each storyboard, save to
storyboards/ directory
- Generate First Frame Descriptions: Write first frame descriptions for each storyboard, save to
first-frames/ directory
Method 1: Use Command
Ctrl+Shift+P → "Vibe Video: Configure Video AI"
→ Click "Open Settings"
Method 2: Open Settings Directly (Recommended)
Ctrl+, → Search "vibevideo"
→ Select Provider (Tongyi Wanxiang, Replicate, or Google Gemini)
→ Enter API Key/Token
Supported Providers:
⚠️ Testing Status Notice:
- ✅ Tested: Tongyi Wanxiang has been fully tested, all features working normally, strongly recommended for production
- ✅ Tested: OpenAI Sora has been tested, but Tongyi Wanxiang is recommended
- ❌ Not Tested: Other Providers (Replicate, Google Gemini) are implemented but not tested with actual APIs
Configuration is automatically saved to VS Code settings
5. AI Models Used
Vibe Video uses the following AI models for different generation tasks:
🤖 Tongyi Wanxiang (DashScope) - ✅ Tested, Production Ready
| Task Type |
Model |
Description |
Testing Status |
| Text-to-Image |
wan2.5-t2i-preview |
Generate images from text prompts (for subjects, scenes, first frames) |
✅ Tested |
| Image-to-Image |
wan2.5-i2i-preview |
Compose multiple images (for combining subjects + scenes) |
✅ Tested |
| Text-to-Video |
wan2.5-i2v-preview |
Generate videos directly from text prompts |
✅ Tested |
| Image-to-Video |
wan2.5-i2v-preview |
Generate videos from initial frame images |
✅ Tested |
| First-Last Frame to Video |
wan2.2-kf2v-flash |
Generate videos from first and last frame images (for precise control) |
✅ Tested |
| Image Editing |
qwen-image-edit-plus |
Edit images using text prompts (change background, add elements, etc.) |
✅ Tested |
🌐 Replicate - ⚠️ Not Tested
| Task Type |
Default Model |
Description |
Testing Status |
| Text-to-Image |
stability-ai/sdxl |
Generate images from text prompts |
❌ Not Tested |
| Image-to-Image |
- |
Image editing feature |
❌ Not Supported |
| Text-to-Video |
anotherjesse/zeroscope-v2-xl |
Generate videos from text prompts |
❌ Not Tested |
| Image-to-Video |
anotherjesse/zeroscope-v2-xl |
Generate videos from initial frame images |
❌ Not Tested |
| First-Last Frame to Video |
- |
Generate videos from first and last frame images |
❌ Not Supported |
🎬 OpenAI Sora - ✅ Tested (Tongyi Wanxiang Recommended)
| Task Type |
Default Model |
Description |
Testing Status |
| Text-to-Image |
gpt-image-1 |
Generate images from text prompts (also supports dall-e-3) |
✅ Tested |
| Image-to-Image |
gpt-image-1 |
Image editing feature (supports multi-image composition) |
✅ Tested |
| Text-to-Video |
sora-2 |
Generate videos from text prompts |
✅ Tested |
| Image-to-Video |
sora-2 |
Generate videos from initial frame images |
✅ Tested |
| First-Last Frame to Video |
- |
Generate videos from first and last frame images |
❌ Not Supported |
🔷 Google Gemini - ⚠️ Not Tested
| Task Type |
Default Model |
Description |
Testing Status |
| Text-to-Image |
gemini-3-pro-image-preview |
Generate images from text prompts |
❌ Not Tested |
| Image-to-Image |
gemini-2.5-flash-image |
Image editing feature |
❌ Not Tested |
| Text-to-Video |
veo-3 |
Generate videos from text prompts |
❌ Not Tested |
| Image-to-Video |
veo-3 |
Generate videos from initial frame images |
❌ Not Tested |
| First-Last Frame to Video |
- |
Generate videos from first and last frame images |
❌ Not Supported |
⚠️ Testing Status Notice:
- ✅ Tongyi Wanxiang: Fully tested, all features working normally, strongly recommended for production environment
- ✅ OpenAI Sora: Tested, all features working normally, but Tongyi Wanxiang is recommended
- ❌ Other Providers (Replicate, Google Gemini): Code implemented but not tested with actual APIs
- May encounter undiscovered bugs
- API format may not match expectations
- Features may be incomplete
- If you find issues, please submit an Issue
Note: Replicate, OpenAI Sora, and Google Gemini models can be customized in settings. The models listed above are defaults.
6. Generate Resources
Use sidebar resource view or commands:
- Generate Subject Images:
Vibe Video: Generate All Subjects
- Generate Scene Images:
Vibe Video: Generate All Scenes
- Generate First Frame Images:
Vibe Video: Generate First Frames
- Generate Videos:
Vibe Video: Generate All Videos
📁 Project Structure
MyVideoProject/
├── Script.md # Your script
├── subjects/ # Subjects/Characters (description + generated images)
│ ├── main-character.md # Subject description
│ ├── main-character.png # Generated subject image
│ └── ...
├── scenes/ # Scenes (description + generated images)
│ ├── city-street.md # Scene description
│ ├── city-street.png # Generated scene image
│ └── ...
├── storyboards/ # Storyboard scripts (Markdown)
│ ├── 01-opening.md
│ └── ...
├── first-frames/ # First frames (description + generated images)
│ ├── 01-opening-first-frame.md # First frame description
│ ├── 01-opening-first-frame.png # Generated first frame image
│ └── ...
├── video-clip/ # Generated video clips
│ ├── 01-opening.mp4
│ └── ...
├── ref-img/ # User-defined reference images (optional)
│ └── product.jpg
├── output/ # Final composed video
│ └── final.mp4
├── .vv-context/ # AI context documents (auto-generated)
├── .temp/ # Temporary files
├── .cursorrules # Cursor AI rules (auto-generated)
├── .clinerules # Cline AI rules (auto-generated)
└── .vv-project.json # Project configuration (auto-generated)
📝 Important Notes
- Storyboard Duration: Each storyboard duration can only be 5s or 10s, other durations are not supported
- Subject Function: Used to ensure consistent character appearance, subject images have pure white background
- Reference Images: Specify images to reference, represented by file address URLs
- Last Frame Function: Add
- **Last Frame**: first-frames/xxx-last-frame.png field in storyboard script to enable first-last frame video generation for more precise control ⭐ New
📋 Requirements
- VS Code 1.105.0 or higher
- Node.js 18+
- (Optional) ffmpeg - for video composition
🎯 Command List
Project Management
Vibe Video: Initialize Project - Initialize project structure
Vibe Video: Check Storyboards Quality - Check storyboard quality
Vibe Video: Show Project Stats - Show project statistics
Vibe Video: Refresh Resources - Refresh resource view
Configuration
Vibe Video: Configure Video AI - Configure video AI service
Vibe Video: Show Current Config - Show current configuration
Generate Resources
Vibe Video: Generate All Subjects - Batch generate all subject images
Vibe Video: Generate All Scenes - Batch generate all scene images
Vibe Video: Generate First Frames - Batch generate all first frame images
Vibe Video: Generate All Videos - Batch generate all videos
Vibe Video: Compose All First Frames - Compose first frames using subjects and scenes
Vibe Video: Generate All Videos From First Last Frame - Generate all videos from first and last frames ⭐
Vibe Video: Compose Video - Compose all video clips into final video ⭐ New
Generate Subject - Generate single subject image
Generate Scene - Generate single scene image
Generate Video - Generate single video clip
Generate Video From First Last Frame - Generate video from first and last frames ⭐
Edit Image - Edit image using AI (change background, add elements, etc.) ⭐ New
🚧 Development Status
Current Version: 0.0.9 (Alpha)
✅ Implemented
- Project initialization (including subjects, scenes directories)
- Markdown storyboard parsing (supports subjects, scenes, reference images, last frame)
- Subject library management (generate subject images)
- Scene library management (generate scene images)
- First frame generation (text-to-image)
- Video generation (image-to-video, based on first frames)
- First-last frame video generation (using first frame + last frame) ⭐
- Multi-image composition (subject + scene → first frame)
- Video composition (FFmpeg-based video merging) ⭐
- Image editing (AI-powered image editing using text prompts) ⭐ New
- Quality checks
- Sidebar resource view
- Multiple AI Provider support:
- Tongyi Wanxiang API integration (default, ✅ Tested, production ready, supports first-last frame video generation)
- OpenAI Sora API integration (supports sora-2 video generation, gpt-image-1/dall-e-3 image generation) ✅ Tested (Tongyi Wanxiang recommended)
- Replicate API integration (supports Zeroscope, AnimateDiff, SDXL, and more models) ⚠️ Not Tested
- Google Gemini API integration (supports gemini-3-pro-image-preview, veo-3 models) ⚠️ Not Tested
🚧 In Development
- Parallel generation optimization
- More AI Provider Support:
- ✅ OpenAI Sora Provider support (implemented, ✅ tested, Tongyi Wanxiang recommended)
- Claude Code Skills Support:
- Integrate Claude Code's skills functionality to enhance prompt quality through skills
- Provide rich excellent prompt example library (product showcase, lifestyle, story scenarios, etc.)
- Automatically optimize subject descriptions, scene descriptions, and first frame descriptions based on example library
- Let AI learn best practices through skills, generate more professional prompts that meet video production requirements
- Improve consistency, accuracy, and executability of AI-generated content
📚 Documentation
Detailed documentation can be found in the DOC/ directory:
🤝 Contributing
Issues and Pull Requests are welcome!
For questions or suggestions, please contact:
- 📧 Email: cici_yiyi@qq.com
- 💬 WeChat: Scan QR code to add (QR code image)
- 👥 QQ Group: 454222772

💼 Service Support
- 🔧 Technical Support: Provide technical problem solving and troubleshooting during use
- 🎨 Custom Development: Provide feature customization and secondary development services according to your needs
📄 License
Vibe Video uses a dual licensing model:
- GPL v3: For open source projects and individual developers (see LICENSE)
- Commercial License: For commercial users who need to use it in proprietary software or do not want to comply with GPL terms (see LICENSE-COMMERCIAL.md)
Choosing a License
- Open Source Use: If you are an open source project or individual developer, you can use the GPL v3 license directly, completely free
- Commercial Use: If you need to use it in proprietary software or do not want to open source derivative works, please purchase a commercial license
To purchase a commercial license, please contact:
- 📧 Email: cici_yiyi@qq.com
- 💬 WeChat: Scan QR code to add
- 👥 QQ Group: 454222772
Enjoy creating videos with Vibe Video! 🎬