ComfyUI Pose ControlNet Usage Example
This guide will introduce you to the basic concepts of Pose ControlNet, and demonstrate how to generate large-sized images in ComfyUI using a two-pass generation approach
Introduction to OpenPose
OpenPose is an open-source real-time multi-person pose estimation system developed by Carnegie Mellon University (CMU), representing a significant breakthrough in the field of computer vision. The system can simultaneously detect multiple people in an image, capturing:
- Body skeleton: 18 keypoints, including head, shoulders, elbows, wrists, hips, knees, and ankles
- Facial expressions: 70 facial keypoints for capturing micro-expressions and facial contours
- Hand details: 21 hand keypoints for precisely expressing finger positions and gestures
- Foot posture: 6 foot keypoints, recording standing postures and movement details
In AI image generation, skeleton structure maps generated by OpenPose serve as conditional inputs for ControlNet, enabling precise control over the posture, actions, and expressions of generated characters. This allows us to generate realistic human figures with expected poses and actions, greatly improving the controllability and practical value of AI-generated content. Particularly for early Stable Diffusion 1.5 series models, skeletal maps generated by OpenPose can effectively prevent issues with distorted character actions, limbs, and expressions.
ComfyUI 2-Pass Pose ControlNet Usage Example
1. Pose ControlNet Workflow Assets
Please download the workflow image below and drag it into ComfyUI to load the workflow:
Images with workflow JSON in their metadata can be directly dragged into ComfyUI or loaded using the menu Workflows
-> Open (ctrl+o)
.
This image already includes download links for the corresponding models, and dragging it into ComfyUI will automatically prompt for downloads.
Please download the image below, which we will use as input:
2. Manual Model Installation
If your network cannot successfully complete the automatic download of the corresponding models, please try manually downloading the models below and placing them in the specified directories:
- control_v11p_sd15_openpose_fp16.safetensors
- majicmixRealistic_v7.safetensors
- japaneseStyleRealistic_v20.safetensors
- vae-ft-mse-840000-ema-pruned.safetensors
3. Step-by-Step Workflow Execution
Follow these steps according to the numbered markers in the image:
- Ensure that
Load Checkpoint
can load majicmixRealistic_v7.safetensors - Ensure that
Load VAE
can load vae-ft-mse-840000-ema-pruned.safetensors - Ensure that
Load ControlNet Model
can load control_v11p_sd15_openpose_fp16.safetensors - Click the select button in the
Load Image
node to upload the pose input image provided earlier, or use your own OpenPose skeleton map - Ensure that
Load Checkpoint
can load japaneseStyleRealistic_v20.safetensors - Click the
Queue
button or use the shortcutCtrl(cmd) + Enter
to execute the image generation
Explanation of the Pose ControlNet 2-Pass Workflow
This workflow uses a two-pass image generation approach, dividing the image creation process into two phases:
First Phase: Basic Pose Image Generation
In the first phase, the majicmixRealistic_v7 model is combined with Pose ControlNet to generate an initial character pose image:
- First, load the majicmixRealistic_v7 model via the
Load Checkpoint
node - Load the pose control model through the
Load ControlNet Model
node - The input pose image is fed into the
Apply ControlNet
node and combined with positive and negative prompt conditions - The first
KSampler
node (typically using 20-30 steps) generates a basic character pose image - The pixel-space image for the first phase is obtained through
VAE Decode
This phase primarily focuses on correct character posture, pose, and basic structure, ensuring that the generated character conforms to the input skeletal pose.
Second Phase: Style Optimization and Detail Enhancement
In the second phase, the output image from the first phase is used as a reference, with the japaneseStyleRealistic_v20 model performing stylization and detail enhancement:
- The image generated in the first phase creates a larger resolution latent space through the
Upscale latent
node - The second
Load Checkpoint
loads the japaneseStyleRealistic_v20 model, which focuses on details and style - The second
KSampler
node uses a lowerdenoise
strength (typically 0.4-0.6) for refinement, preserving the basic structure from the first phase - Finally, a higher quality, larger resolution image is output through the second
VAE Decode
andSave Image
nodes
This phase primarily focuses on style consistency, detail richness, and enhancing overall image quality.
Advantages of 2-Pass Image Generation
Compared to single-pass generation, the two-pass image generation method offers the following advantages:
- Higher Resolution: Two-pass processing can generate high-resolution images beyond the capabilities of single-pass generation
- Style Blending: Can combine advantages of different models, such as using a realistic model in the first phase and a stylized model in the second phase
- Better Details: The second phase can focus on optimizing details without having to worry about overall structure
- Precise Control: Once pose control is completed in the first phase, the second phase can focus on refining style and details
- Reduced GPU Load: Generating in two passes allows for high-quality large images with limited GPU resources
To learn more about techniques for mixing multiple ControlNets, please refer to the Mixing ControlNet Models tutorial.
Was this page helpful?