torch.Tensor
class.
Complete documentation is here, or
an introduction to the key concepts required for Comfy here.
(image,)
not (image)
torch.Tensor
with shape [B,H,W,C]
, C=3
. If you are going to save or load images, you will
need to convert to and from PIL.Image
format - see the code snippets below! Note that some pytorch
operations
offer (or expect) [B,C,H,W]
, known as ‘channel first’, for reasons of computational efficiency. Just be careful.
torch.Tensor
with shape [B,H,W]
.
In many contexts, masks have binary values (0 or 1), which are used to indicate which pixels should undergo specific operations.
In some cases values between 0 and 1 are used indicate an extent of masking, (for instance, to alter transparency, adjust filters, or composite layers).
LoadImage
node uses an image’s alpha channel (the “A” in “RGBA”) to create MASKs.
The values from the alpha channel are normalized to the range [0,1] (torch.float32) and then inverted.
The LoadImage
node always produces a MASK output when loading an image. Many images (like JPEGs) don’t have an alpha channel.
In these cases, LoadImage
creates a default mask with the shape [1, 64, 64]
.
numpy
, PIL
, and many others, single-channel images (like masks) are typically represented as 2D arrays, shape [H,W]
.
This means the C
(channel) dimension is implicit, and thus unlike IMAGE types, batches of MASKs have only three dimensions: [B, H, W]
.
It is not uncommon to encounter a mask which has had the B
dimension implicitly squeezed, giving a tensor [H,W]
.
To use a MASK, you will often have to match shapes by unsqueezing to produce a shape [B,H,W,C]
with C=1
To unsqueezing the C
dimension, so you should unsqueeze(-1)
, to unsqueeze B
, you unsqueeze(0)
.
If your node receives a MASK as input, you would be wise to always check len(mask.shape)
.
dict
; the latent sample is referenced by the key samples
and has shape [B,C,H,W]
, with C=4
.