In AI image generation, a single control condition often fails to meet the requirements of complex scenes. Mixing multiple ControlNets allows you to control different regions or aspects of an image simultaneously, achieving more precise control over image generation.

In certain scenarios, mixing ControlNets can leverage the characteristics of different control conditions to achieve more refined conditional control:

  1. Scene Complexity: Complex scenes require multiple control conditions working together
  2. Fine-grained Control: By adjusting the strength parameter of each ControlNet, you can precisely control the degree of influence for each part
  3. Complementary Effects: Different types of ControlNets can complement each other, compensating for the limitations of single controls
  4. Creative Expression: Combining different controls can produce unique creative effects

How to Mix ControlNets

When mixing multiple ControlNets, each ControlNet influences the image generation process according to its applied area. ComfyUI enables multiple ControlNet conditions to be applied sequentially in a layered manner through chain connections in the Apply ControlNet node:

ComfyUI ControlNet Regional Division Mixing Example

In this example, we will use a combination of Pose ControlNet and Scribble ControlNet to generate a scene containing multiple elements: a character on the left controlled by Pose ControlNet and a cat on a scooter on the right controlled by Scribble ControlNet.

1. ControlNet Mixing Workflow Assets

Please download the workflow image below and drag it into ComfyUI to load the workflow:

This workflow image contains Metadata, and can be directly dragged into ComfyUI or loaded using the menu Workflows -> Open (ctrl+o). The system will automatically detect and prompt to download the required models.

Input pose image (controls the character pose on the left):

Input scribble image (controls the cat and scooter on the right):

2. Manual Model Installation

If your network cannot successfully complete the automatic download of the corresponding models, please try manually downloading the models below and placing them in the specified directories:

ComfyUI/
├── models/
│   ├── checkpoints/
│   │   └── awpainting_v14.safetensors
│   ├── controlnet/
│   │   └── control_v11p_sd15_scribble_fp16.safetensors
│   │   └── control_v11p_sd15_openpose_fp16.safetensors
│   ├── vae/
│   │   └── vae-ft-mse-840000-ema-pruned.safetensors

3. Step-by-Step Workflow Execution

Follow these steps according to the numbered markers in the image:

  1. Ensure that Load Checkpoint can load awpainting_v14.safetensors
  2. Ensure that Load VAE can load vae-ft-mse-840000-ema-pruned.safetensors

First ControlNet group using the Openpose model: 3. Ensure that Load ControlNet Model loads control_v11p_sd15_openpose_fp16.safetensors 4. Click Upload in the Load Image node to upload the pose image provided earlier

Second ControlNet group using the Scribble model: 5. Ensure that Load ControlNet Model loads control_v11p_sd15_scribble_fp16.safetensors 6. Click Upload in the Load Image node to upload the scribble image provided earlier 7. Click the Queue button or use the shortcut Ctrl(cmd) + Enter to execute the image generation

Workflow Explanation

Strength Balance

When controlling different regions of an image, balancing the strength parameters is particularly important:

  • If the ControlNet strength for one region is significantly higher than another, it may cause that region’s control effect to overpower and suppress the other region
  • It’s recommended to set similar strength values for ControlNets controlling different regions, for example, both set to 1.0

Prompt Techniques

For regional division mixing, the prompt needs to include descriptions of both regions:

"A woman in red dress, a cat riding a scooter, detailed background, high quality"

Such a prompt covers both the character and the cat on the scooter, ensuring the model pays attention to both control regions.

Multi-dimensional Control Applications for a Single Subject

In addition to the regional division mixing shown in this example, another common mixing approach is to apply multi-dimensional control to the same subject. For example:

  • Pose + Depth: Control character posture and spatial sense
  • Pose + Canny: Control character posture and edge details
  • Pose + Reference: Control character posture while referencing a specific style

In this type of application, reference images for multiple ControlNets should be aligned to the same subject, and their strengths should be adjusted to ensure proper balance.

By combining different types of ControlNets and specifying their control regions, you can achieve precise control over elements in your image.