You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The loss initially was decreasing until it reach nan's for a while. I am running it on the squad dataset and the exact argument used for running it is:
So the only change is the train batch tokens to 2000 since my GPU was running out of memory. I am attaching a screenshot. Is there anything I am missing? Should I try something else?
The text was updated successfully, but these errors were encountered:
Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?
Well that's no good. Let me try running your exact command on my side to see if I get the same thing. Do you know which iteration this first started on? Is it 438000?
I had the same question when I ran
nvidia-docker run -it --rm -v pwd:/decaNLP/ -u $(id -u):$(id -g) bmccann/decanlp:cuda9_torch041 bash -c "python /decaNLP/train.py --train_tasks squad --device 0"
It started at iretation_316800.
The loss initially was decreasing until it reach nan's for a while. I am running it on the squad dataset and the exact argument used for running it is:
python train.py --train_tasks squad --device 0 --data ./.data --save ./results/ --embeddings ./.embeddings/ --train_batch_tokens 2000
So the only change is the train batch tokens to 2000 since my GPU was running out of memory. I am attaching a screenshot. Is there anything I am missing? Should I try something else?
The text was updated successfully, but these errors were encountered: