Pre-training Instructions

Data Preparation

Note: Our pre-training dataset are inherited from MDETR paper, you can also refer to their pre-training instructions to prepare the dataset and potentially train the model with other datasets.

Download the original Flickr30k image dataset from Flickr30K webpage and update the flickr_img_path to the folder containing the images.
Download the original Flickr30k entities annotations from Flickr30k annotations and update the flickr_dataset_path to the folder with annotations.
Download MDETR's pre-processed annotations from this link and update the flickr_ann_path to this folder with pre-processed annotations.

Pre-training

We trained our model with 8 Nvidia A40 GPUs, each with 48GB of VRAM. The pre-training process took 3 days to finish with a step number of 150k.

The pre-training process is logged with WandB.

W2-Bert

The pretrain_flickr.sh script contains the exact command we used to pre-train our model.

bash scripts/pretrain/pretrain_w2bert.sh

Note:

For analysis purpose, we also saved a series of checkpoints during the pre-training process. If you don't want to save the checkpoints, you can remove the --save_for_aoa flag from the script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pre-training Instructions

Data Preparation

Pre-training

W2-Bert

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pre-training Instructions

Data Preparation

Pre-training

W2-Bert