Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Output from MoBA on RunPod - Attention Mask Warnings & Repetitive Text #12

Open
ericblue opened this issue Feb 25, 2025 · 0 comments

Comments

@ericblue
Copy link

ericblue commented Feb 25, 2025

Hi, I'm running MoBA on RunPod using a high-performance GPU instance(1 x H100 SXM 26 vCPU 251 GB RAM), but the model is generating nonsensical and repetitive outputs. Additionally, I am seeing warnings related to attention masks and sampling settings. This issue persists across different prompts.

I followed the provided install instructions in the README. No torch version was listed in requirements.txt but manually installed:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Uname: Linux ffe21a18248e 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Python: 3.10.16
Torch Version: 2.6.0+cu118
CUDA Available: True
CUDA Version: 11.8
CUDNN Version: 90100

Pip freeze output:

accelerate==1.4.0
certifi==2025.1.31
charset-normalizer==3.4.1
einops==0.8.1
exceptiongroup==1.2.2
filelock==3.13.1
flash_attn==2.6.3
fsspec==2024.6.1
huggingface-hub==0.29.1
idna==3.10
iniconfig==2.0.0
Jinja2==3.1.4
MarkupSafe==2.1.5
moba @ file:///app/MoBA
mpmath==1.3.0
networkx==3.3
numpy==2.1.2
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==9.1.0.70
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.3.0.86
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusparse-cu11==11.7.5.86
nvidia-nccl-cu11==2.21.5
nvidia-nvtx-cu11==11.8.86
packaging==24.2
pillow==11.0.0
pluggy==1.5.0
psutil==7.0.0
pytest==8.3.4
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sympy==1.13.1
tokenizers==0.21.0
tomli==2.2.1
torch==2.6.0+cu118
torchaudio==2.6.0+cu118
torchvision==0.21.0+cu118
tqdm==4.67.1
transformers==4.49.0
triton==3.2.0
typing_extensions==4.12.2
urllib3==2.3.0

Command Used to Run Model:

python3 examples/llama.py --model meta-llama/Llama-3.1-8B --attn moba

Output

[00:14<00:00, 3.67s/it] /root/miniconda/envs/moba/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:629: UserWarning: do_sampleis set toFalse. However, temperatureis set to0.6-- this flag is only used in sample-based generation modes. You should setdo_sample=Trueor unsettemperature. warnings.warn( /root/miniconda/envs/moba/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:634: UserWarning: do_sampleis set toFalse. However, top_pis set to0.9-- this flag is only used in sample-based generation modes. You should setdo_sample=Trueor unsettop_p. warnings.warn( The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_maskto obtain reliable results. Settingpad_token_idtoeos_token_id:128001 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. tensor([[128000, 5269, 527, 499, 30, 358, 1097, 7060, 13, 358, 1097, 264, 5575, 13, 358, 1097, 21630, 304, 264, 12374, 13, 358, 1097, 21630, 304, 264, 12374, 13, 358, 1097, 21630, 304]], device='cuda:0') <|begin_of_text|>how are you? I am fine. I am a student. I am studying in a university. I am studying in a university. I am studying in

I'm assuming the attention mask warnings may be causing the repetitive output.

Are there specific MoBA settings that should be adjusted? Also, could this be related to Torch or CUDA version mismatches? What are the recommended versions for best performance?

Any guidance on resolving these issues would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant