Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
Deepak Sridhar,
Abhishek Peri,
Rohith Rachala,
Nuno Vasconcelos
NeurIPS '24 |
GitHub | arXiv | Project page
Use --recursive
to also clone the segmentation editor app
git clone --recursive https://github.com/DeepakSridhar/fgdm.git
A suitable conda environment named ldm
can be created
and activated with:
conda env create -f fgdm.yaml
conda activate ldm
We used COCO17 dataset for training FG-DMs.
- You can download the COCO 2017 dataset from the official COCO Dataset Website. Download the following components: Annotations: Includes caption and instance annotations. Images: Includes train2017, val2017, and test2017.
- Extract Files Extract all downloaded files into the /data/coco directory or to your desired location. Place the annotation files in the annotations/ folder. Place the image folders in the images/ folder.
- Verify the Directory Structure Ensure that your directory structure matches as outlined below.
coco/
|---- annotations/
|------- captions_train2017.json
|------- captions_val2017.json
|------- instances_train2017.json
|------- instances_val2017.json
|------- train2017/
|------- val2017/
|---- images/
|------- train2017/
|------- val2017/
The segmentation FGDM weights are available on Google Drive Place them under models directory
bash run_inference.sh
-
We used sdv1.4 weights for training FG-DM conditions but sdv1.5 is also compatible:
-
The original SD weights are available via the CompVis organization at Hugging Face. The license terms are identical to the original weights.
-
sd-v1-4.ckpt
: Resumed fromsd-v1-2.ckpt
. 225k steps at resolution512x512
on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. -
Download the condition weights from ControlNet and place them in the models folder to train depth and normal FG-DMs.
-
Alternatively download all these models by running download_models.sh file under scripts directory.
python main.py --base configs/stable-diffusion/nautilus_coco_adapter_semantic_map_gt_captions_distill_loss.yaml -t --gpus 0,
Our codebase for the diffusion models builds heavily on LDM codebase and ControlNet.
Thanks for open-sourcing!
@inproceedings{neuripssridhar24,
author = {Sridhar, Deepak and Peri, Abhishek and Rachala, Rohit and Vasconcelos, Nuno},
title = {Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis},
booktitle = {Neural Information Processing Systems},
year = {2024},
}