This guide will introduce you to the basic concepts of Pose ControlNet, and demonstrate how to generate large-sized images in ComfyUI using a two-pass generation approach
OpenPose is an open-source real-time multi-person pose estimation system developed by Carnegie Mellon University (CMU), representing a significant breakthrough in the field of computer vision. The system can simultaneously detect multiple people in an image, capturing:
In AI image generation, skeleton structure maps generated by OpenPose serve as conditional inputs for ControlNet, enabling precise control over the posture, actions, and expressions of generated characters. This allows us to generate realistic human figures with expected poses and actions, greatly improving the controllability and practical value of AI-generated content. Particularly for early Stable Diffusion 1.5 series models, skeletal maps generated by OpenPose can effectively prevent issues with distorted character actions, limbs, and expressions.
Please download the workflow image below and drag it into ComfyUI to load the workflow:
Images with workflow JSON in their metadata can be directly dragged into ComfyUI or loaded using the menu Workflows
-> Open (ctrl+o)
.
This image already includes download links for the corresponding models, and dragging it into ComfyUI will automatically prompt for downloads.
Please download the image below, which we will use as input:
If your network cannot successfully complete the automatic download of the corresponding models, please try manually downloading the models below and placing them in the specified directories:
Follow these steps according to the numbered markers in the image:
Load Checkpoint
can load majicmixRealistic_v7.safetensorsLoad VAE
can load vae-ft-mse-840000-ema-pruned.safetensorsLoad ControlNet Model
can load control_v11p_sd15_openpose_fp16.safetensorsLoad Image
node to upload the pose input image provided earlier, or use your own OpenPose skeleton mapLoad Checkpoint
can load japaneseStyleRealistic_v20.safetensorsQueue
button or use the shortcut Ctrl(cmd) + Enter
to execute the image generationThis workflow uses a two-pass image generation approach, dividing the image creation process into two phases:
In the first phase, the majicmixRealistic_v7 model is combined with Pose ControlNet to generate an initial character pose image:
Load Checkpoint
nodeLoad ControlNet Model
nodeApply ControlNet
node and combined with positive and negative prompt conditionsKSampler
node (typically using 20-30 steps) generates a basic character pose imageVAE Decode
This phase primarily focuses on correct character posture, pose, and basic structure, ensuring that the generated character conforms to the input skeletal pose.
In the second phase, the output image from the first phase is used as a reference, with the japaneseStyleRealistic_v20 model performing stylization and detail enhancement:
Upscale latent
nodeLoad Checkpoint
loads the japaneseStyleRealistic_v20 model, which focuses on details and styleKSampler
node uses a lower denoise
strength (typically 0.4-0.6) for refinement, preserving the basic structure from the first phaseVAE Decode
and Save Image
nodesThis phase primarily focuses on style consistency, detail richness, and enhancing overall image quality.
Compared to single-pass generation, the two-pass image generation method offers the following advantages:
To learn more about techniques for mixing multiple ControlNets, please refer to the Mixing ControlNet Models tutorial.