Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122

Closed
sumedhpendurkar opened this issue Jan 3, 2025 · 8 comments
Closed

Comments

@sumedhpendurkar
Copy link

I am getting 0 accuracy (until 65 instances) for prontoQA with Llama3.1 Instruct 8 B.

I have attached the args.txt and result.log.

Is there anything I am missing? I just adding some import statements and changed the path of llama model.

Thanks,

@Ber666
Copy link
Collaborator

Ber666 commented Jan 6, 2025

Hi, thanks for providing more details. Did you change the library code? I tried to reproduce the results using your argument but couldn't successfully run it. Our Llama3Model class seems not compatible to llama3.1...?

From the log, the problem seems to be answer parsing. The generated reasoning chains seem reasonable (for example, the reasoning of case 10 is actually correct), but somehow the separation of steps is wrong.. after parsing, the generated reasoning chains cannot be matched to the expected form. I cannot tell which exact part caused the error based on the provided information (maybe it's because the instruct-tuned model doesn't react to the prompt properly, or maybe it's because the eos token setting doesn't work for llama3.1-instruct), but It might be helpful if you could print the raw input and raw output of the language model and share with me.

@sumedhpendurkar
Copy link
Author

Thanks for your response.

Regarding Code: Code version is as follows. I was on commit ID: 90fdf11. The diff file is minor changes to get the code working (some import statements+removing logging). I have attached the diff file. I had to remove ``use_scaled_rope'' from the .json file that I obtained from Llama-3.1-instruct that I downloaded using huggingface-cli.

Answer Parsing: I agree with your speculation regarding EOS token setting. Regarding the raw input and raw output of LM, can you tell me which variables would you like me to print in the code?

In the mean time, I will try using older Llama instruct version.

@Ber666
Copy link
Collaborator

Ber666 commented Jan 7, 2025

Regarding the raw input and raw output of LM, can you tell me which variables would you like me to print in the code?

Can you try printing the inputs and out_tokens in Llama3Model.generate?

@sumedhpendurkar
Copy link
Author

Sure, the inputs.log and outputs.log are attached seperately.

@sumedhpendurkar
Copy link
Author

I attempted to run the same code with Llama-3-8B-Instruct (as suggested in original readme, as opposed to 3.1-instruct). Unfortunately, I see similar results, with similar outputs.log answers.

@Ber666
Copy link
Collaborator

Ber666 commented Jan 10, 2025

Hi, we updated a little bit (supporting huggingface model for this task). I ran the following command (based on the one you shared) and it seemed to work well.

CUDA_VISIBLE_DEVICES=4 python examples/ToT/prontoqa/tot_inference.py --base_lm hf --model_dir /home/sumedh/meta-llama --model_dir meta-llama/Llama-3.1-8B-Instruct --batch_size 8 --search_algo dfs --log_dir logs/prontoqa_tot_dfs_abc --depth_limit 10 --total_states 10 --temperature 0.8 --max_per_state 3

@sumedhpendurkar
Copy link
Author

Thanks a lot! This fixes issue. The HF class should work with other models like Phi, etc. as is, is that correct?

I will soon verify issue #111 .

@Ber666
Copy link
Collaborator

Ber666 commented Jan 11, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants