Self-Supervised Vision Transformers with DINO

PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supervised Vision Transformers. [arXiv]

Installation

This codebase has been developed with :

python 3.9
pytorch 1.12.0
CUDA 11.3
torchvision 0.13.0

Make sure to install the requirements: pip3 install -r requirements.txt

⚠️ To execute the commands provided in the next sections for training and evaluation, the dino package should be included in the Python module search path :

export PYTHONPATH="${PYTHONPATH}:/path/to/your/dino"

Data preparation

Vanilla pretraining

The dataset you intend to pretrain on should be structured as follow:

patch_pretraining/
   └──imgs/
       ├── patch_1.jpg
       ├── patch_2.jpg
       └── ...

Where patch_pretraining/imgs/ is the directory of patches (e.g. in .jpg format) extracted using HS2P, used to pretrain the first Transformer block (ViT_patch).

Hierarchical pretraining

In case you want to run hierarchical pretraining, you need to structure your data as follow:

region_pretraining/
   ├── slide_1_region_1.pt
   ├── slide_1_region_2.pt
   └── ...

Where region_pretraining/ is the directory of pre-extracted region-level features for each region, generated using python3 dino/extract_features.py. Each *.pt file is a [npatch × 384]-sized Tensor, which contains the sequence of pre-extracted ViT_patch features for each [patch_size × patch_size] patch in a given region. This folder is used to pretain the intermediate Transformer block (ViT_region).

Training

In the following python commands, make sure to replace {gpu} with the number of gpus available for pretraining.

Vanilla ViT DINO pretraining 🦕

Update the config file dino/config/patch.yaml to match your local setup.
Then kick off distributed pretraining of a vanilla ViT-S/16 :

python3 -m torch.distributed.run --nproc_per_node={gpu} dino/patch.py

Alternatively, you can check notebooks/vanilla_dino.ipynb.

Hierarchical pretraining 🦖

Update the config file dino/config/region.yaml to match your local setup.
Then kick off distributed pretraining of a ViT-S/4096_256 :

python3 -m torch.distributed.run --nproc_per_node={gpu} dino/region.py

Alternatively, you can check notebooks/hierarchical_dino.ipynb.

Name	Name	Last commit message	Last commit date
Latest commit clemsgrs added ViT-L to vision_transformers.py + improved img_size argument fl… Nov 26, 2024 8769b17 · Nov 26, 2024 History 31 Commits
.github	.github	added vanilla dino notebook + pre-comit hook	Nov 28, 2023
dino	dino	added ViT-L to vision_transformers.py + improved img_size argument fl…	Nov 26, 2024
docker	docker	updated Dockerfile	Nov 30, 2023
notebooks	notebooks	updated README	Nov 29, 2023
.gitignore	.gitignore	added vanilla dino notebook + pre-comit hook	Nov 28, 2023
.pre-commit-config.yaml	.pre-commit-config.yaml	minor renaming from 'train' to 'query' during knn evaluation	Dec 22, 2023
README.md	README.md	updated README	Nov 29, 2023
lint.txt	lint.txt	added vanilla dino notebook + pre-comit hook	Nov 28, 2023
requirements.txt	requirements.txt	added versions to requirements	Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Supervised Vision Transformers with DINO

Installation

Data preparation

Vanilla pretraining

Hierarchical pretraining

Training

Vanilla ViT DINO pretraining 🦕

Hierarchical pretraining 🦖

About

Releases

Packages

Languages

clemsgrs/dino

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Vision Transformers with DINO

Installation

Data preparation

Vanilla pretraining

Hierarchical pretraining

Training

Vanilla ViT DINO pretraining 🦕

Hierarchical pretraining 🦖

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages