Kandinsky 5.0

Kandinsky 5.0 is a family of diffusion models for video and image generation developed by Kandinsky Lab. The Kandinsky 5.0 T2V Lite is a lightweight 2B parameter model that ranks among the top open-source video generation models, capable of generating videos up to 10 seconds long.

Portable or self deployed users
Desktop or Cloud users

Make sure your ComfyUI is updated.

Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:

You are not using the latest ComfyUI version (Nightly version)
Some nodes failed to import at startup

Overview

Kandinsky 5.0 uses a latent diffusion pipeline with Flow Matching and features:

Diffusion Transformer (DiT): Main generative backbone with cross-attention to text embeddings
Qwen2.5-VL and CLIP: Provides high-quality text embeddings
HunyuanVideo 3D VAE: Encodes and decodes video into a latent space

The model family includes multiple variants optimized for different use cases:

SFT model: Highest generation quality
CFG-distilled: 2× faster inference
Diffusion-distilled: 6× faster with minimal quality loss (16 steps)
Pretrain model: Designed for fine-tuning

All models are available in 5-second and 10-second video generation versions.

Model variants

Model	Video Duration	NFE	Latency (H100)
Kandinsky 5.0 T2V Lite SFT	5s / 10s	100	139s / 224s
Kandinsky 5.0 T2V Lite no-CFG	5s / 10s	50	77s / 124s
Kandinsky 5.0 T2V Lite distill	5s / 10s	16	35s / 61s
Kandinsky 5.0 I2V Lite	5s	100	673s

Text-to-Video workflow

1. Download workflow file

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Kandinsky 5.0 T2V” to load the workflow.

Download JSON Workflow File

2. Manually download models

Text Encoders

Diffusion Model

kandinsky5lite_t2v_sft_5s.safetensors

VAE

hunyuan_video_vae_bf16.safetensors

ComfyUI/
├── 📂 models/
│   ├── 📂 text_encoders/
│   │      ├── qwen_2.5_vl_7b_fp8_scaled.safetensors
│   │      └── clip_l.safetensors
│   ├── 📂 diffusion_models/
│   │      └── kandinsky5lite_t2v_sft_5s.safetensors
│   └── 📂 vae/
│          └── hunyuan_video_vae_bf16.safetensors

Image-to-Video workflow

1. Download workflow file

Please update your ComfyUI to the latest version, and through the menu Workflow -> Browse Templates -> Video, find “Kandinsky 5.0 I2V” to load the workflow.

Download JSON Workflow File

2. Manually download models

Text Encoders

Diffusion Model

kandinsky5lite_i2v_5s.safetensors

VAE

hunyuan_video_vae_bf16.safetensors

ComfyUI/
├── 📂 models/
│   ├── 📂 text_encoders/
│   │      ├── qwen_2.5_vl_7b_fp8_scaled.safetensors
│   │      └── clip_l.safetensors
│   ├── 📂 diffusion_models/
│   │      └── kandinsky5lite_i2v_5s.safetensors
│   └── 📂 vae/
│          └── hunyuan_video_vae_bf16.safetensors

Get Started

Agent Tools / MCP

Basic Concepts

Interface Guide

Tutorials

Partner Nodes

Overview

Model variants

Text-to-Video workflow

1. Download workflow file

2. Manually download models

Image-to-Video workflow

1. Download workflow file

2. Manually download models

Resources

​Overview

​Model variants

​Text-to-Video workflow

​1. Download workflow file

​2. Manually download models

​Image-to-Video workflow

​1. Download workflow file

​2. Manually download models

​Resources

Overview

Model variants

Text-to-Video workflow

1. Download workflow file

2. Manually download models

Image-to-Video workflow

1. Download workflow file

2. Manually download models

Resources