Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Incomplete Evaluation on re10k Dataset and Lower-than-Expected Results #52

Open
Yang-xiao-uts opened this issue Feb 22, 2025 · 5 comments

Comments

@Yang-xiao-uts
Copy link

Dear authors,

Thank you for your work. I have a question regarding the evaluation on the re10k dataset. I found that my test results are somewhat lower than those reported in the paper. Additionally, the test set only processed 6,474 samples instead of the full dataset (7,286 samples). Upon inspecting the code, I noticed that some examples were skipped due to the following condition:

try:
context_indices, target_indices = self.view_sampler.sample(
scene,
extrinsics,
intrinsics,
)
except ValueError:
# Skip because the example doesn't have enough frames.
continue

I used the processed versions of the dataset from pixelSplat, which I downloaded from the following link:
http://schadenfreude.csail.mit.edu:8000/

Interestingly, my results on the dl3dv dataset were as expected, but the issue only occurs when testing on re10k. Could you help me identify the cause of this discrepancy?

Below is my test script:

evaluate on re10k

CUDA_VISIBLE_DEVICES=0 python -m src.main +experiment=re10k
dataset.test_chunk_interval=1
model.encoder.num_scales=2
model.encoder.upsample_factor=2
model.encoder.lowest_feature_resolution=4
model.encoder.monodepth_vit_type=vitl
model.encoder.gaussian_regressor_channels=64
model.encoder.color_large_unet=true
model.encoder.feature_upsampler_channels=128
checkpointing.pretrained_model=/pretrained/depthsplat-gs-large-re10k-256x256-288d9b26.pth
mode=test
dataset/view_sampler=evaluation
test.compute_scores=true
wandb.mode=disabled
test.save_image=false
test.save_depth=true
test.save_depth_concat_img=true
output_dir=output/depthsplat-depth-large-re10k_train

Thank you!

@haofeixu
Copy link
Member

Hi, the number of test samples (6,474) is correct, and this is aligned with previous methods (i.e., some samples are skipped).

Could you double check if the pre-trained model is loaded correctly since there is an additional / in your path (not sure if you stored the weights in /pretrained instead of ./pretrained)? Thanks.

@Yang-xiao-uts
Copy link
Author

Thank you for your response! I have double-checked, and it doesn't seem to be an issue with the pre-trained model not loading correctly. The PSNR results on the RE10K dataset are only about 1 point lower than those reported in the paper, which suggests that the model is functioning as expected. Moreover, the results on the DL3DV dataset are consistent with those in the paper, further indicating that the model is likely loaded correctly. Let me know if you have any other thoughts!

@haofeixu
Copy link
Member

Hi, could you reproduce the results with small and base models?

@Yang-xiao-uts
Copy link
Author

I reproduced the results with the small and base models. The evaluation metrics are as follows:
RE10K dataset:
Small model (depthsplat-gs-small-re10k-256x256-49b2d15c.pth)
psnr 26.357855931306496
ssim 0.8731088268239446
lpips 0.1267912069120947
Base model (depthsplat-gs-base-re10k-256x256-044fdb17.pth)
psnr 26.54941343809429
ssim 0.8781711351143686
lpips 0.12566229011871312
For reference, the large model (depthsplat-gs-large-re10k-256x256-288d9b26.pth) achieved:
psnr 26.309472988518735
ssim 0.8769212319945199
lpips 0.12807822649378375
Additionally, on the DL3DV dataset, the base model (depthsplat-gs-base-dl3dv-256x448-75cc0183.pth) achieved:
psnr 19.00118804492539
ssim 0.6082053313152396
lpips 0.31477285712528574

@haofeixu
Copy link
Member

Hi, it turns out that all the numbers on re10k are slightly worse than our results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants