WanTextToVideoApi - ComfyUI Built-in Node Documentation

The Wan Text to Video node generates video content based on text descriptions. It uses AI models to create videos from prompts and supports various video sizes, durations, and optional audio inputs. The node can automatically generate audio when needed and provides options for prompt enhancement and watermarking.

Inputs

Parameter	Description	Data Type	Required	Range
`model`	Model to use (default: “wan2.6-t2v”)	COMBO	Yes	”wan2.5-t2v-preview" "wan2.6-t2v”
`prompt`	Prompt describing the elements and visual features. Supports English and Chinese (default: "")	STRING	Yes	-
`negative_prompt`	Negative prompt describing what to avoid (default: "")	STRING	No	-
`size`	Video resolution and aspect ratio (default: “720p: 1:1 (960x960)“)	COMBO	No	”480p: 1:1 (624x624)" "480p: 16:9 (832x480)" "480p: 9:16 (480x832)" "720p: 1:1 (960x960)" "720p: 16:9 (1280x720)" "720p: 9:16 (720x1280)" "720p: 4:3 (1088x832)" "720p: 3:4 (832x1088)" "1080p: 1:1 (1440x1440)" "1080p: 16:9 (1920x1080)" "1080p: 9:16 (1080x1920)" "1080p: 4:3 (1632x1248)" "1080p: 3:4 (1248x1632)“
`duration`	Duration of the video in seconds. A 15-second duration is available only for the Wan 2.6 model (default: 5)	INT	No	5-15 (in steps of 5)
`audio`	Audio must contain a clear, loud voice, without extraneous noise or background music	AUDIO	No	-
`seed`	Seed to use for generation (default: 0)	INT	No	0-2147483647
`generate_audio`	If no audio input is provided, generate audio automatically (default: False)	BOOLEAN	No	-
`prompt_extend`	Whether to enhance the prompt with AI assistance (default: True)	BOOLEAN	No	-
`watermark`	Whether to add an AI-generated watermark to the result (default: False)	BOOLEAN	No	-
`shot_type`	Specifies the shot type for the generated video, that is, whether the video is a single continuous shot or multiple shots with cuts. This parameter takes effect only when prompt_extend is True (default: “single”)	COMBO	No	”single" "multi”

Note: The Wan 2.6 model does not support 480p resolutions. A 15-second duration is only supported by the Wan 2.6 model. When providing audio input, it must be between 3.0 and 29.0 seconds in duration and contain clear voice without background noise or music.

Outputs

Output Name	Description	Data Type
`output`	The generated video based on the input parameters	VIDEO

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 105ff4b575dc6cd319448b0f4c0e9cfab360df025480190ae6bccc324d351525

​Inputs

​Outputs

Inputs

Outputs