Skip to main content
The WanCameraImageToVideo node prepares conditioning and latent data for video generation from images. It takes positive and negative conditioning prompts, along with optional starting images and camera controls, and outputs modified conditioning and an empty latent tensor ready for a video model to fill in.

Inputs

ParameterDescriptionData TypeRequiredRange
positivePositive conditioning prompts for video generationCONDITIONINGYes-
negativeNegative conditioning prompts to avoid in video generationCONDITIONINGYes-
vaeVAE model for encoding images to latent spaceVAEYes-
widthOutput video width in pixels (default: 832, step: 16)INTYes16 to MAX_RESOLUTION
heightOutput video height in pixels (default: 480, step: 16)INTYes16 to MAX_RESOLUTION
lengthNumber of frames in the video sequence (default: 81, step: 4)INTYes1 to MAX_RESOLUTION
batch_sizeNumber of videos to generate simultaneously (default: 1)INTYes1 to 4096
clip_vision_outputOptional CLIP vision output for additional conditioningCLIP_VISION_OUTPUTNo-
start_imageOptional starting image to initialize the video sequence. When provided, the first frames of the video will be based on this image, with a mask applied to blend the starting frames with generated content. The image is resized to match the specified width and height.IMAGENo-
camera_conditionsOptional camera embedding conditions for video generation. When provided, these conditions are applied to both positive and negative conditioning.WAN_CAMERA_EMBEDDINGNo-
Note: When start_image is provided, the node uses it to initialize the video sequence and applies masking to blend the starting frames with generated content. The camera_conditions and clip_vision_output parameters are optional but when provided, they modify the conditioning for both positive and negative prompts.

Outputs

Output NameDescriptionData Type
positiveModified positive conditioning with applied camera conditions, clip vision outputs, and/or starting image dataCONDITIONING
negativeModified negative conditioning with applied camera conditions, clip vision outputs, and/or starting image dataCONDITIONING
latentGenerated empty video latent representation for use with video models. The latent tensor has dimensions [batch_size, 16, frames, height/8, width/8] where frames is calculated as ((length - 1) // 4) + 1.LATENT
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): e2309b40f78d5a2487242f1684f82d9e4dd8405ef256615f82da2f701418fd4a