You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, it says "To process the masked image features, BrushNet utilizes a clone of the pre-trained diffusion model while excluding its cross-attention layers. The pretrained weights of the diffusion model serve as a strong prior for extracting the masked image features, while the removal of the cross-attention layers ensures that only pure image information is considered within this additional branch." So I assume that brushnet only keep self-attention block.
But when I check brushnet config file and code, I print brushnet modules, and only see some resnet block, linear, etc. And in brushnet config file, the 2D block it specified is pure conv block(DownBlock2d, Mid, Up).
So the cross attention you remove is not only cross attention layer, but also self attention layer, which is 'CrossAttnDownBlock2D''s crossattn block, right?
The text was updated successfully, but these errors were encountered:
In the paper, it says "To process the masked image features, BrushNet utilizes a clone of the pre-trained diffusion model while excluding its cross-attention layers. The pretrained weights of the diffusion model serve as a strong prior for extracting the masked image features, while the removal of the cross-attention layers ensures that only pure image information is considered within this additional branch." So I assume that brushnet only keep self-attention block.
But when I check brushnet config file and code, I print brushnet modules, and only see some resnet block, linear, etc. And in brushnet config file, the 2D block it specified is pure conv block(DownBlock2d, Mid, Up).
So the cross attention you remove is not only cross attention layer, but also self attention layer, which is 'CrossAttnDownBlock2D''s crossattn block, right?
The text was updated successfully, but these errors were encountered: