Unexpected Output from MoBA on RunPod - Attention Mask Warnings & Repetitive Text #12

ericblue · 2025-02-25T03:14:21Z

Hi, I'm running MoBA on RunPod using a high-performance GPU instance(1 x H100 SXM 26 vCPU 251 GB RAM), but the model is generating nonsensical and repetitive outputs. Additionally, I am seeing warnings related to attention masks and sampling settings. This issue persists across different prompts.

I followed the provided install instructions in the README. No torch version was listed in requirements.txt but manually installed:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Uname: Linux ffe21a18248e 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Python: 3.10.16
Torch Version: 2.6.0+cu118
CUDA Available: True
CUDA Version: 11.8
CUDNN Version: 90100

Pip freeze output:

accelerate==1.4.0
certifi==2025.1.31
charset-normalizer==3.4.1
einops==0.8.1
exceptiongroup==1.2.2
filelock==3.13.1
flash_attn==2.6.3
fsspec==2024.6.1
huggingface-hub==0.29.1
idna==3.10
iniconfig==2.0.0
Jinja2==3.1.4
MarkupSafe==2.1.5
moba @ file:///app/MoBA
mpmath==1.3.0
networkx==3.3
numpy==2.1.2
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==9.1.0.70
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.3.0.86
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusparse-cu11==11.7.5.86
nvidia-nccl-cu11==2.21.5
nvidia-nvtx-cu11==11.8.86
packaging==24.2
pillow==11.0.0
pluggy==1.5.0
psutil==7.0.0
pytest==8.3.4
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sympy==1.13.1
tokenizers==0.21.0
tomli==2.2.1
torch==2.6.0+cu118
torchaudio==2.6.0+cu118
torchvision==0.21.0+cu118
tqdm==4.67.1
transformers==4.49.0
triton==3.2.0
typing_extensions==4.12.2
urllib3==2.3.0

Command Used to Run Model:

python3 examples/llama.py --model meta-llama/Llama-3.1-8B --attn moba

Output

[00:14<00:00, 3.67s/it] /root/miniconda/envs/moba/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:629: UserWarning: do_sampleis set toFalse. However, temperatureis set to0.6-- this flag is only used in sample-based generation modes. You should setdo_sample=Trueor unsettemperature. warnings.warn( /root/miniconda/envs/moba/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:634: UserWarning: do_sampleis set toFalse. However, top_pis set to0.9-- this flag is only used in sample-based generation modes. You should setdo_sample=Trueor unsettop_p. warnings.warn( The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_maskto obtain reliable results. Settingpad_token_idtoeos_token_id:128001 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. tensor([[128000, 5269, 527, 499, 30, 358, 1097, 7060, 13, 358, 1097, 264, 5575, 13, 358, 1097, 21630, 304, 264, 12374, 13, 358, 1097, 21630, 304, 264, 12374, 13, 358, 1097, 21630, 304]], device='cuda:0') <|begin_of_text|>how are you? I am fine. I am a student. I am studying in a university. I am studying in a university. I am studying in

I'm assuming the attention mask warnings may be causing the repetitive output.

Are there specific MoBA settings that should be adjusted? Also, could this be related to Torch or CUDA version mismatches? What are the recommended versions for best performance?

Any guidance on resolving these issues would be appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected Output from MoBA on RunPod - Attention Mask Warnings & Repetitive Text #12

Unexpected Output from MoBA on RunPod - Attention Mask Warnings & Repetitive Text #12

ericblue commented Feb 25, 2025 •

edited

Loading

Unexpected Output from MoBA on RunPod - Attention Mask Warnings & Repetitive Text #12

Unexpected Output from MoBA on RunPod - Attention Mask Warnings & Repetitive Text #12

Comments

ericblue commented Feb 25, 2025 • edited Loading

ericblue commented Feb 25, 2025 •

edited

Loading