Step 1: Prepare the dataset. Download from here. We use the setup from Mao et al. 2019. See the original instruction here: GitHub Repo.
To replicate the experiments, you need to prepare your dataset as the following.
clevr
├── train
│ ├── images
│ ├── questions.json
│ ├── scenes-raw.json
│ └── scenes.json
│ └── vocab.json
└── val
│ ├── images
│ ├── questions.json
│ ├── scenes-raw.json
│ └── scenes.json
│ └── vocab.json
CLEVR_CoGenT_v1
├── trainA
│ ├── images
│ ├── questions.json
│ ├── scenes-raw.json
│ └── scenes.json
│ └── vocab.json
├── valA
│ ├── images
│ ├── questions.json
│ └── scenes.json
│ └── vocab.json
└── valB
├── images
├── questions.json
└── scenes.json
└── vocab.json
You can download all images, and put them under the images/
folders from the official website of the CLEVR dataset.
The questions.json
and scenes-raw.json
could also been found on the website.
Next, you need to add object detection results for scenes. Here, we use the tools provided by ns-vqa. In short, a pre-trained Mask-RCNN is used to detect all objects. We provide the json files with detected object bounding boxes at clevr/train/scenes.json and clevr/val/scenes.json.
The vocab.json
could be downloaded at this URL.
Step 2: Generate groundtruth programs for CLEVR/train and CLEVR/val.
jac-run scripts/gen-clevr-gt-program.py --input data/clevr/train/questions.json --output data/clevr/train/questions-ncprogram-gt.pkl
jac-run scripts/gen-clevr-gt-program.py --input data/clevr/val/questions.json --output data/clevr/val/questions-ncprogram-gt.pkl
Step 4: Training.
You will need Cython to compile some libraries. Please install Cython before you run the training commands.
jac-crun 0 scripts/trainval-clevr.py --desc experiments/desc_clevr_nesycoco.py \
--data-dir data/clevr/train \
--data-parses data/clevr/train/questions-ncprogram-gt.pkl data/clevr/val/questions-ncprogram-gt.pkl \
--curriculum all --expr original --validation-interval 5 --data-tvsplit 0.95
Step 5: Evaluation.
jac-crun 0 scripts/trainval-clevr.py --desc experiments/desc_clevr_nesycoco.py \
--data-dir data/clevr/train \
--data-parses data/clevr/train/questions-ncprogram-gt.pkl data/clevr/val/questions-ncprogram-gt.pkl \
--curriculum all --expr original --validation-interval 5 --data-tvsplit 0.95 \
--load <TRAINED_CHECKPOINT_FILE> \
--validation-data-dir data/clevr/val --evaluate
jac-crun 0 scripts/trainval-clevr.py --desc experiments/desc_clevr_nesycoco.py \
--data-dir data/clevr-mini --data-parses questions-ncprogram-gt.pkl transfer-questions-ncprogram-gt.json \
--expr transfer \
--load <TRAINED_CHECKPOINT_FILE> \
--evaluate-custom ref --data-questions-json refexps-20230513.json
Note that here we use the CLEVR-Mini dataset from NS-VQA, as we need to have the groundtruth set of objects.
You can also generate your own dataset using the code scripts/gen-clevr-ref.py
This script uses the groundtruth logic programs. To use the program generated by GPT4, use the files inside data/clevr-parsings
.
jac-crun 0 scripts/trainval-clevr.py --desc experiments/desc_clevr_nesycoco.py \
--data-dir data/clevr-mini --data-parses questions-ncprogram-gt.pkl transfer-questions-ncprogram-gt.json \
--expr transfer \
--load <TRAINED_CHECKPOINT_FILE> \
--evaluate-custom puzzle --data-questions-json puzzle-20230513.json
You can also generate your own dataset using the code scripts/gen-clevr-puzzle.py
This script uses the groundtruth logic programs. To use the program generated by GPT4, use the files inside data/clevr-parsings
.
jac-crun 0 scripts/trainval-clevr.py --desc experiments/desc_clevr_nesycoco.py \
--data-dir data/clevr-mini --data-parses questions-ncprogram-gt.pkl transfer-questions-ncprogram-gt.json \
--expr transfer \
--load <TRAINED_CHECKPOINT_FILE> \
--evaluate-custom rpm --data-questions-json rpm-20230513.json
To use the program generated by LLaMa3, use the files inside data/clevr-parsings
.
To generate those data-parses files, run the commands inside the prompts/ directory. You need to install transformers pip install transformers
before running the command.
jac-run run-llama-prompt.py --dataset clevr-puzzles --questions <PATH_TO>/puzzle-20230513.json --output clevr_transfer_puzzle_llama3.pkl --prompt prompts-clevr-transfer.txt
jac-run run-llama-prompt.py --dataset clevr-refexps --questions <PATH_TO>/refexps-20230513.json --output clevr_transfer_ref_llama3.pkl --prompt prompts-clevr-transfer.txt
jac-run run-llama-prompt.py --dataset clevr-rpms --questions <PATH_TO>/rpm-20230513.json --output clevr_transfer_rpm_llama3.pkl --prompt prompts-clevr-transfer.txt
The data for the CELVR Synonym test sets are in the data/clevr
folder. You can run the below code to generate new splits.
python scripts/generate_clevr_syn.py --data_dir <PATH_TO_CLEVR_DATA>
After the process, CLEVR_DATA
folder will look like this:
clevr
├── train
├── train-syn
├── val-syn-easy
├── val-syn-medium
├── val-syn-hard
└── val
To evaluate a trained model with this dataset run the below code command:
jac-crun 0 scripts/trainval-clevr.py --desc experiments/desc_clevr_nesycoco.py \
--data-dir data/clevr/train \
--data-parses data/clevr/train-syn/questions-ncprogram-gt.pkl data/clevr/val-syn-easy/questions-ncprogram-gt.pkl \
--curriculum all --expr original --validation-interval 5 --data-tvsplit 0.95 \
--load <TRAINED_CHECKPOINT_FILE> \
--validation-data-dir data/clevr/val-syn-easy --evaluate