Skip to content

[NeurIPS 2024] Factor Graph Diffusion Models for Improved Prompt Alignment, Controllable and Editing of Images via Joint Distribution Modeling

License

Notifications You must be signed in to change notification settings

DeepakSridhar/fgdm

Repository files navigation

FG-DM

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
Deepak Sridhar, Abhishek Peri, Rohith Rachala, Nuno Vasconcelos
NeurIPS '24 | GitHub | arXiv | Project page

fg-dm

Cloning

Use --recursive to also clone the segmentation editor app

git clone --recursive https://github.com/DeepakSridhar/fgdm.git

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f fgdm.yaml
conda activate ldm

Dataset

We used COCO17 dataset for training FG-DMs.

  1. You can download the COCO 2017 dataset from the official COCO Dataset Website. Download the following components: Annotations: Includes caption and instance annotations. Images: Includes train2017, val2017, and test2017.
  2. Extract Files Extract all downloaded files into the /data/coco directory or to your desired location. Place the annotation files in the annotations/ folder. Place the image folders in the images/ folder.
  3. Verify the Directory Structure Ensure that your directory structure matches as outlined below.

coco/

|---- annotations/

|------- captions_train2017.json

|------- captions_val2017.json

|------- instances_train2017.json

|------- instances_val2017.json

|------- train2017/

|------- val2017/

|---- images/

|------- train2017/

|------- val2017/

FG-DM Pretrained Weights

The segmentation FGDM weights are available on Google Drive Place them under models directory

Inference: Text-to-Image with FG-DM

bash run_inference.sh

Training: FG-DM Seg from scratch

  • We used sdv1.4 weights for training FG-DM conditions but sdv1.5 is also compatible:

  • The original SD weights are available via the CompVis organization at Hugging Face. The license terms are identical to the original weights.

  • sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

  • Download the condition weights from ControlNet and place them in the models folder to train depth and normal FG-DMs.

  • Alternatively download all these models by running download_models.sh file under scripts directory.

python main.py --base configs/stable-diffusion/nautilus_coco_adapter_semantic_map_gt_captions_distill_loss.yaml -t --gpus 0,

Acknowledgements

Our codebase for the diffusion models builds heavily on LDM codebase and ControlNet.

Thanks for open-sourcing!

BibTeX

@inproceedings{neuripssridhar24,
      author = {Sridhar, Deepak and Peri, Abhishek and Rachala, Rohit and Vasconcelos, Nuno},
      title = {Adapting Diffusion Models for Improved Prompt Compliance   and Controllable Image Synthesis},
      booktitle = {Neural Information Processing Systems},
      year = {2024},
  }

About

[NeurIPS 2024] Factor Graph Diffusion Models for Improved Prompt Alignment, Controllable and Editing of Images via Joint Distribution Modeling

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages