-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122
Comments
Hi, thanks for providing more details. Did you change the library code? I tried to reproduce the results using your argument but couldn't successfully run it. Our From the log, the problem seems to be answer parsing. The generated reasoning chains seem reasonable (for example, the reasoning of case 10 is actually correct), but somehow the separation of steps is wrong.. after parsing, the generated reasoning chains cannot be matched to the expected form. I cannot tell which exact part caused the error based on the provided information (maybe it's because the instruct-tuned model doesn't react to the prompt properly, or maybe it's because the eos token setting doesn't work for llama3.1-instruct), but It might be helpful if you could print the raw input and raw output of the language model and share with me. |
Thanks for your response. Regarding Code: Code version is as follows. I was on commit ID: 90fdf11. The diff file is minor changes to get the code working (some import statements+removing logging). I have attached the diff file. I had to remove ``use_scaled_rope'' from the .json file that I obtained from Llama-3.1-instruct that I downloaded using huggingface-cli. Answer Parsing: I agree with your speculation regarding EOS token setting. Regarding the raw input and raw output of LM, can you tell me which variables would you like me to print in the code? In the mean time, I will try using older Llama instruct version. |
Can you try printing the |
Sure, the inputs.log and outputs.log are attached seperately. |
I attempted to run the same code with Llama-3-8B-Instruct (as suggested in original readme, as opposed to 3.1-instruct). Unfortunately, I see similar results, with similar outputs.log answers. |
Hi, we updated a little bit (supporting huggingface model for this task). I ran the following command (based on the one you shared) and it seemed to work well. CUDA_VISIBLE_DEVICES=4 python examples/ToT/prontoqa/tot_inference.py --base_lm hf --model_dir /home/sumedh/meta-llama --model_dir meta-llama/Llama-3.1-8B-Instruct --batch_size 8 --search_algo dfs --log_dir logs/prontoqa_tot_dfs_abc --depth_limit 10 --total_states 10 --temperature 0.8 --max_per_state 3 |
Thanks a lot! This fixes issue. The HF class should work with other models like Phi, etc. as is, is that correct? I will soon verify issue #111 . |
I am getting 0 accuracy (until 65 instances) for prontoQA with Llama3.1 Instruct 8 B.
I have attached the args.txt and result.log.
Is there anything I am missing? I just adding some import statements and changed the path of llama model.
Thanks,
The text was updated successfully, but these errors were encountered: