Kandinsky 5.0 is a family of diffusion models for video and image generation developed by Kandinsky Lab. The Kandinsky 5.0 T2V Lite is a lightweight 2B parameter model that ranks among the top open-source video generation models, capable of generating videos up to 10 seconds long.
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates.
If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
- You are not using the latest ComfyUI version (Nightly version)
- Some nodes failed to import at startup
Overview
Kandinsky 5.0 uses a latent diffusion pipeline with Flow Matching and features:
- Diffusion Transformer (DiT): Main generative backbone with cross-attention to text embeddings
- Qwen2.5-VL and CLIP: Provides high-quality text embeddings
- HunyuanVideo 3D VAE: Encodes and decodes video into a latent space
The model family includes multiple variants optimized for different use cases:
- SFT model: Highest generation quality
- CFG-distilled: 2× faster inference
- Diffusion-distilled: 6× faster with minimal quality loss (16 steps)
- Pretrain model: Designed for fine-tuning
All models are available in 5-second and 10-second video generation versions.
Model variants
| Model | Video Duration | NFE | Latency (H100) |
|---|
| Kandinsky 5.0 T2V Lite SFT | 5s / 10s | 100 | 139s / 224s |
| Kandinsky 5.0 T2V Lite no-CFG | 5s / 10s | 50 | 77s / 124s |
| Kandinsky 5.0 T2V Lite distill | 5s / 10s | 16 | 35s / 61s |
| Kandinsky 5.0 I2V Lite | 5s | 100 | 673s |
Text-to-Video workflow
1. Download workflow file
Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Kandinsky 5.0 T2V” to load the workflow.
Download JSON Workflow File
2. Manually download models
Text Encoders
Diffusion Model
VAE
ComfyUI/
├── 📂 models/
│ ├── 📂 text_encoders/
│ │ ├── qwen_2.5_vl_7b_fp8_scaled.safetensors
│ │ └── clip_l.safetensors
│ ├── 📂 diffusion_models/
│ │ └── kandinsky5lite_t2v_sft_5s.safetensors
│ └── 📂 vae/
│ └── hunyuan_video_vae_bf16.safetensors
Image-to-Video workflow
1. Download workflow file
Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Kandinsky 5.0 I2V” to load the workflow.
Download JSON Workflow File
2. Manually download models
Text Encoders
Diffusion Model
VAE
ComfyUI/
├── 📂 models/
│ ├── 📂 text_encoders/
│ │ ├── qwen_2.5_vl_7b_fp8_scaled.safetensors
│ │ └── clip_l.safetensors
│ ├── 📂 diffusion_models/
│ │ └── kandinsky5lite_i2v_sft_5s.safetensors
│ └── 📂 vae/
│ └── hunyuan_video_vae_bf16.safetensors
Resources