Issue with Incomplete Evaluation on re10k Dataset and Lower-than-Expected Results #52

Yang-xiao-uts · 2025-02-22T07:27:03Z

Dear authors,

Thank you for your work. I have a question regarding the evaluation on the re10k dataset. I found that my test results are somewhat lower than those reported in the paper. Additionally, the test set only processed 6,474 samples instead of the full dataset (7,286 samples). Upon inspecting the code, I noticed that some examples were skipped due to the following condition:

try:
context_indices, target_indices = self.view_sampler.sample(
scene,
extrinsics,
intrinsics,
)
except ValueError:
# Skip because the example doesn't have enough frames.
continue

I used the processed versions of the dataset from pixelSplat, which I downloaded from the following link:
http://schadenfreude.csail.mit.edu:8000/

Interestingly, my results on the dl3dv dataset were as expected, but the issue only occurs when testing on re10k. Could you help me identify the cause of this discrepancy?

Below is my test script:

evaluate on re10k

CUDA_VISIBLE_DEVICES=0 python -m src.main +experiment=re10k
dataset.test_chunk_interval=1
model.encoder.num_scales=2
model.encoder.upsample_factor=2
model.encoder.lowest_feature_resolution=4
model.encoder.monodepth_vit_type=vitl
model.encoder.gaussian_regressor_channels=64
model.encoder.color_large_unet=true
model.encoder.feature_upsampler_channels=128
checkpointing.pretrained_model=/pretrained/depthsplat-gs-large-re10k-256x256-288d9b26.pth
mode=test
dataset/view_sampler=evaluation
test.compute_scores=true
wandb.mode=disabled
test.save_image=false
test.save_depth=true
test.save_depth_concat_img=true
output_dir=output/depthsplat-depth-large-re10k_train

Thank you!

haofeixu · 2025-02-23T09:49:08Z

Hi, the number of test samples (6,474) is correct, and this is aligned with previous methods (i.e., some samples are skipped).

Could you double check if the pre-trained model is loaded correctly since there is an additional / in your path (not sure if you stored the weights in /pretrained instead of ./pretrained)? Thanks.

Yang-xiao-uts · 2025-02-24T03:17:20Z

Thank you for your response! I have double-checked, and it doesn't seem to be an issue with the pre-trained model not loading correctly. The PSNR results on the RE10K dataset are only about 1 point lower than those reported in the paper, which suggests that the model is functioning as expected. Moreover, the results on the DL3DV dataset are consistent with those in the paper, further indicating that the model is likely loaded correctly. Let me know if you have any other thoughts!

haofeixu · 2025-02-24T13:27:17Z

Hi, could you reproduce the results with small and base models?

Yang-xiao-uts · 2025-02-24T23:56:37Z

I reproduced the results with the small and base models. The evaluation metrics are as follows:
RE10K dataset:
Small model (depthsplat-gs-small-re10k-256x256-49b2d15c.pth)
psnr 26.357855931306496
ssim 0.8731088268239446
lpips 0.1267912069120947
Base model (depthsplat-gs-base-re10k-256x256-044fdb17.pth)
psnr 26.54941343809429
ssim 0.8781711351143686
lpips 0.12566229011871312
For reference, the large model (depthsplat-gs-large-re10k-256x256-288d9b26.pth) achieved:
psnr 26.309472988518735
ssim 0.8769212319945199
lpips 0.12807822649378375
Additionally, on the DL3DV dataset, the base model (depthsplat-gs-base-dl3dv-256x448-75cc0183.pth) achieved:
psnr 19.00118804492539
ssim 0.6082053313152396
lpips 0.31477285712528574

haofeixu · 2025-02-25T15:43:59Z

Hi, it turns out that all the numbers on re10k are slightly worse than our results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Incomplete Evaluation on re10k Dataset and Lower-than-Expected Results #52

Issue with Incomplete Evaluation on re10k Dataset and Lower-than-Expected Results #52

Yang-xiao-uts commented Feb 22, 2025

haofeixu commented Feb 23, 2025

Yang-xiao-uts commented Feb 24, 2025

haofeixu commented Feb 24, 2025

Yang-xiao-uts commented Feb 24, 2025

haofeixu commented Feb 25, 2025

Issue with Incomplete Evaluation on re10k Dataset and Lower-than-Expected Results #52

Issue with Incomplete Evaluation on re10k Dataset and Lower-than-Expected Results #52

Comments

Yang-xiao-uts commented Feb 22, 2025

evaluate on re10k

haofeixu commented Feb 23, 2025

Yang-xiao-uts commented Feb 24, 2025

haofeixu commented Feb 24, 2025

Yang-xiao-uts commented Feb 24, 2025

haofeixu commented Feb 25, 2025