-
Notifications
You must be signed in to change notification settings - Fork 20
4b Human perception
Getting human perception scores from street-level imagery. The perception categorries are safety, lively, beautiful, wealthy, boring and depressing.
The scores are in scale of 0-10.
Safety, lively, beautiful, wealthy
high score indicates strong positive feeling
Boring, depressing
high score indicates strong negative feeling
The models are pretrained on the MIT Place Pulse 2.0 dataset. The backbone of the model is vision transformer (ViT) pretrianed on ImageNet (ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1
). We added 3 Linear layers with ReLU as activation, in ViT heads for classification.
Code snippet:
nn.Linear(num_fc, 512, bias=True),
nn.ReLU(True),
nn.Linear(512, 256, bias=True),
nn.ReLU(True),
nn.Linear(256, num_class, bias=True)
The model structure can be found in code/model_training/perception/Model_01.py
. The pretained models will be automatically downloaded when run inference.py
(recommended method). You can also manually download the models here.
Set up environment with requirements-cv-linux.txt
.
Input
The input CSV should:
- have each row representing an image to process, and
- contain minimally two columns, named
uuid
andpath
, to specify image UUID and the local image file path, respectively
Output
One CSV for each perception dimension.
Each CSV contains two columns: uuid
s(image name) and the inferred perception scores.
To reproduce sample_output
Modify out_Path
in inference.py
to the directory you wish to store the output CSVs, then
python3 inference.py
Modify inference.py
:
- Modify
out_Path
to the directory you wish to store the output CSVs - Modify
in_Path
to the path of your input CSV
Run:
python3 inference.py
Our work in human perception builds on and uses code from human-perception-place-pulse developed by Ouyang (2023).