EOS TOKEN PROBLEM? #131

debajoycs98 · 2025-01-17T21:48:58Z

I see that for llama3 8B the llm generates made up questions and answers after solving the original problem. That causes answers to come wrong. Why does this happen? Is there a way to fix this?

Ber666 · 2025-01-22T09:22:03Z

Yes I think they are caused by the EOS token setting.. For example, we may have set "\n" as the eos in some applications, so that the model wouldn't continue to make up new questions after it generates expected output for the current question. But since the tokenizers for different LLMs are not the same, maybe ".\n" or ".\n\n" is tokenized as one token in a new model, in which case the "\n" token cannot be not catched..

There are several workarounds: (1) you could play with the tokenizer to find out the appropriate EOS token for the LLM you use (2) You could use other LLM engines like SGLang or VLLM

Could you share me the script you are running and the line that caused the error? Maybe I can provide some quick suggestions if I've seen similar problems.

debajoycs98 changed the title ~~EOS TOKEN PROBLEM MAYBE~~ EOS TOKEN PROBLEM? Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOS TOKEN PROBLEM? #131

EOS TOKEN PROBLEM? #131

debajoycs98 commented Jan 17, 2025

Ber666 commented Jan 22, 2025 •

edited

Loading

EOS TOKEN PROBLEM? #131

EOS TOKEN PROBLEM? #131

Comments

debajoycs98 commented Jan 17, 2025

Ber666 commented Jan 22, 2025 • edited Loading

Ber666 commented Jan 22, 2025 •

edited

Loading