Hunyuan-Video
Tencent has released open source video models for text to video, and image to video.
Image to Video
Using Hunyuan I2V, you can transform still images into fluid, high-quality videos. Try it with this starting frame.
Drag the video directly into ComfyUI to run the workflow.
Unified Image & Video Architecture
The “Dual-stream to Single-stream” Transformer efficiently fuses text, images, and motion information, enhancing consistency, quality, and alignment across the generated video frames.
Superior Text-Video-Image Alignment
The MLLM text encoder outperforms traditional encoders like CLIP and T5, offering better instruction following, detail capture, and complex reasoning when combined with image inputs.
Efficient Video Compression
A custom 3D VAE compresses videos into a compact latent space, preserving resolution and frame rate while reducing tokens, making Image-to-Video generation more efficient.
Requirements
Download the following models and place them in the locations specified below:
- llava_llama3_vision.safetensors
- clip_l.safetensors
- llava_llama3_fp16.safetensors
- llava_llama3_fp8_scaled.safetensors
- hunyuan_video_vae_bf16.safetensors
- hunyuan_video_image_to_video_720p_bf16.safetensors
Was this page helpful?