Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122

sumedhpendurkar · 2025-01-03T23:46:05Z

I am getting 0 accuracy (until 65 instances) for prontoQA with Llama3.1 Instruct 8 B.

I have attached the args.txt and result.log.

Is there anything I am missing? I just adding some import statements and changed the path of llama model.

Thanks,

Ber666 · 2025-01-06T01:50:41Z

Hi, thanks for providing more details. Did you change the library code? I tried to reproduce the results using your argument but couldn't successfully run it. Our Llama3Model class seems not compatible to llama3.1...?

From the log, the problem seems to be answer parsing. The generated reasoning chains seem reasonable (for example, the reasoning of case 10 is actually correct), but somehow the separation of steps is wrong.. after parsing, the generated reasoning chains cannot be matched to the expected form. I cannot tell which exact part caused the error based on the provided information (maybe it's because the instruct-tuned model doesn't react to the prompt properly, or maybe it's because the eos token setting doesn't work for llama3.1-instruct), but It might be helpful if you could print the raw input and raw output of the language model and share with me.

sumedhpendurkar · 2025-01-06T18:56:13Z

Thanks for your response.

Regarding Code: Code version is as follows. I was on commit ID: 90fdf11. The diff file is minor changes to get the code working (some import statements+removing logging). I have attached the diff file. I had to remove ``use_scaled_rope'' from the .json file that I obtained from Llama-3.1-instruct that I downloaded using huggingface-cli.

Answer Parsing: I agree with your speculation regarding EOS token setting. Regarding the raw input and raw output of LM, can you tell me which variables would you like me to print in the code?

In the mean time, I will try using older Llama instruct version.

Ber666 · 2025-01-07T08:10:11Z

Regarding the raw input and raw output of LM, can you tell me which variables would you like me to print in the code?

Can you try printing the inputs and out_tokens in Llama3Model.generate?

sumedhpendurkar · 2025-01-07T18:40:35Z

Sure, the inputs.log and outputs.log are attached seperately.

sumedhpendurkar · 2025-01-07T19:34:13Z

I attempted to run the same code with Llama-3-8B-Instruct (as suggested in original readme, as opposed to 3.1-instruct). Unfortunately, I see similar results, with similar outputs.log answers.

Ber666 · 2025-01-10T06:10:18Z

Hi, we updated a little bit (supporting huggingface model for this task). I ran the following command (based on the one you shared) and it seemed to work well.

CUDA_VISIBLE_DEVICES=4 python examples/ToT/prontoqa/tot_inference.py --base_lm hf --model_dir /home/sumedh/meta-llama --model_dir meta-llama/Llama-3.1-8B-Instruct --batch_size 8 --search_algo dfs --log_dir logs/prontoqa_tot_dfs_abc --depth_limit 10 --total_states 10 --temperature 0.8 --max_per_state 3

sumedhpendurkar · 2025-01-11T01:55:14Z

Thanks a lot! This fixes issue. The HF class should work with other models like Phi, etc. as is, is that correct?

I will soon verify issue #111 .

Ber666 · 2025-01-11T02:10:33Z

Yeah, I believe so.

…

On Fri, Jan 10, 2025 at 5:55 PM Sumedh Pendurkar ***@***.***> wrote: Thanks a lot! This fixes issue. The HF class should work with other models like Phi, etc. as is, is that correct? I will soon verify issue #111 <https://urldefense.com/v3/__https://github.com/maitrix-org/llm-reasoners/issues/111__;!!Mih3wA!H61clSNgo0AxahIJgqT1F5EazxoKvCHjzejT6VbsiKyK9H1cQ7zLOgNAWkDTNe4PJzj9XVn9dYpa3mx2tONosPA$> . — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/maitrix-org/llm-reasoners/issues/122*issuecomment-2584995857__;Iw!!Mih3wA!H61clSNgo0AxahIJgqT1F5EazxoKvCHjzejT6VbsiKyK9H1cQ7zLOgNAWkDTNe4PJzj9XVn9dYpa3mx2sjWHYro$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALESBIYRTFZDC66Y65Q3XBT2KB2ZTAVCNFSM6AAAAABUSQECZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBUHE4TKOBVG4__;!!Mih3wA!H61clSNgo0AxahIJgqT1F5EazxoKvCHjzejT6VbsiKyK9H1cQ7zLOgNAWkDTNe4PJzj9XVn9dYpa3mx2T9o7ruw$> . You are receiving this because you commented.Message ID: ***@***.***>

Ber666 mentioned this issue Jan 10, 2025

Actions are Empty Strings: Blocksworld+ToT with Llama3.1 #111

Open

sumedhpendurkar closed this as completed Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122

Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122

sumedhpendurkar commented Jan 3, 2025

Ber666 commented Jan 6, 2025 •

edited

Loading

sumedhpendurkar commented Jan 6, 2025

Ber666 commented Jan 7, 2025

sumedhpendurkar commented Jan 7, 2025

sumedhpendurkar commented Jan 7, 2025

Ber666 commented Jan 10, 2025 •

edited

Loading

sumedhpendurkar commented Jan 11, 2025

Ber666 commented Jan 11, 2025 via email

Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122

Unable to reproduce ProntoQA results for DFS-ToT. Getting 0 accuracy #122

Comments

sumedhpendurkar commented Jan 3, 2025

Ber666 commented Jan 6, 2025 • edited Loading

sumedhpendurkar commented Jan 6, 2025

Ber666 commented Jan 7, 2025

sumedhpendurkar commented Jan 7, 2025

sumedhpendurkar commented Jan 7, 2025

Ber666 commented Jan 10, 2025 • edited Loading

sumedhpendurkar commented Jan 11, 2025

Ber666 commented Jan 11, 2025 via email

Ber666 commented Jan 6, 2025 •

edited

Loading

Ber666 commented Jan 10, 2025 •

edited

Loading