Skip to main content
Kandinsky 5.0 is a family of diffusion models for video and image generation developed by Kandinsky Lab. The Kandinsky 5.0 T2V Lite is a lightweight 2B parameter model that ranks among the top open-source video generation models, capable of generating videos up to 10 seconds long.
  • Portable or self deployed users
  • Desktop or Cloud users
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup

Overview

Kandinsky 5.0 uses a latent diffusion pipeline with Flow Matching and features:
  • Diffusion Transformer (DiT): Main generative backbone with cross-attention to text embeddings
  • Qwen2.5-VL and CLIP: Provides high-quality text embeddings
  • HunyuanVideo 3D VAE: Encodes and decodes video into a latent space
The model family includes multiple variants optimized for different use cases:
  • SFT model: Highest generation quality
  • CFG-distilled: 2× faster inference
  • Diffusion-distilled: 6× faster with minimal quality loss (16 steps)
  • Pretrain model: Designed for fine-tuning
All models are available in 5-second and 10-second video generation versions.

Model variants

ModelVideo DurationNFELatency (H100)
Kandinsky 5.0 T2V Lite SFT5s / 10s100139s / 224s
Kandinsky 5.0 T2V Lite no-CFG5s / 10s5077s / 124s
Kandinsky 5.0 T2V Lite distill5s / 10s1635s / 61s
Kandinsky 5.0 I2V Lite5s100673s

Text-to-Video workflow

1. Download workflow file

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Kandinsky 5.0 T2V” to load the workflow.

Download JSON Workflow File

2. Manually download models

Text Encoders Diffusion Model VAE
ComfyUI/
├── 📂 models/
│   ├── 📂 text_encoders/
│   │      ├── qwen_2.5_vl_7b_fp8_scaled.safetensors
│   │      └── clip_l.safetensors
│   ├── 📂 diffusion_models/
│   │      └── kandinsky5lite_t2v_sft_5s.safetensors
│   └── 📂 vae/
│          └── hunyuan_video_vae_bf16.safetensors

Image-to-Video workflow

1. Download workflow file

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Kandinsky 5.0 I2V” to load the workflow.

Download JSON Workflow File

2. Manually download models

Text Encoders Diffusion Model VAE
ComfyUI/
├── 📂 models/
│   ├── 📂 text_encoders/
│   │      ├── qwen_2.5_vl_7b_fp8_scaled.safetensors
│   │      └── clip_l.safetensors
│   ├── 📂 diffusion_models/
│   │      └── kandinsky5lite_i2v_sft_5s.safetensors
│   └── 📂 vae/
│          └── hunyuan_video_vae_bf16.safetensors

Resources