MakeTrainingDataset - ComfyUI Built-in Node Documentation

This node prepares data for training by encoding images and text. It takes a list of images and a corresponding list of text captions, then uses a VAE model to convert the images into latent representations and a CLIP model to convert the text into conditioning data. The resulting paired latents and conditioning are output as lists, ready for use in training workflows.

Inputs

Parameter	Description	Data Type	Required	Range
`images`	List of images to encode.	IMAGE	Yes	N/A
`vae`	VAE model for encoding images to latents.	VAE	Yes	N/A
`clip`	CLIP model for encoding text to conditioning.	CLIP	Yes	N/A
`texts`	List of text captions. Can be length n (matching images), 1 (repeated for all), or omitted (uses empty string).	STRING	No	N/A

Parameter Constraints:

The number of items in the texts list must be 0, 1, or exactly match the number of items in the images list. If it is 0, an empty string is used for all images. If it is 1, that single text is repeated for all images.

Outputs

Output Name	Description	Data Type
`latents`	List of latent dicts.	LATENT
`conditioning`	List of conditioning lists.	CONDITIONING

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

Source fingerprint (SHA-256): 72f1686aa9da9d50b1948040c323c7e944d4a5c1f4cd2ec5e0987d998c20ea43

​Inputs

​Outputs

Inputs

Outputs