-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed reproduce llama3-8b result #14
Comments
Hi! You should use the |
Hi, Thank you for providing the codes and tips for reproducing LLaMA 3 results! I modified the LLaMA 2 codes based on your suggestions:
message = ""
message += "<|begin_of_text|><|start_header_id|>system<|end_header_id|>"
message += "\n" + sys_prompt
message += "<|eot_id|><|start_header_id|>user<|end_header_id|>"
message += "\n" + context
message += "<|eot_id|><|start_header_id|>assistant<|end_header_id|>" The results I got for the six tasks are
Results on most datasets are within an acceptable gap to your results while the GSM100k result I got is somehow very bad. |
Hi! I suggest using:
The |
Thanks for your reply. I found the role special tokens have to be added to all the examples in GSM100k, such as context = document + "\n\n" + inst
context = context.replace(
"Question:",
"<|eot_id|><|start_header_id|>user<|end_header_id|>\nQuestion:"
)
context = context.replace(
"Let's think step by step",
"Let's think step by step\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
)
message = ""
message += "<|begin_of_text|><|start_header_id|>system<|end_header_id|>"
message += "\n" + sys_prompt
message += context Then the accuracy will be 78! There is also the other option not to use any chat format message = sys_prompt + "\n" + context the model will act like a pre-trained language model and keep outputting self-curated questions and answers after the CoT and answer for the original question. If we parse the first answer that the model generates (which has been done in the current code), the accuracy is 80. |
i can not reproduce llama3-8b result according ur advice, just got
{'exact_match': 53.9604, 'num_predicted': 202, 'mean_prediction_length_characters': 1.0, 'LEval_score': 53.9604, 'display_keys': ['exact_match'], 'display': [53.9604]}
here is my codes:
python Baselines/llama2-chat-test.py
--metric exam_eval
--task_name quality
--max_length 4k
and change llama2-chat-test.py
elif args.metric == "exam_eval":
context = "Document is as follows. {document} \nQuestion: {inst}. Please directly give the answer without any additional output or explanation "
message="<|begin_of_text|>"+sys_prompt # B_INST + B_SYS + sys_prompt + E_SYS + context + E_INST
message += "\nAnswer:"
The text was updated successfully, but these errors were encountered: