You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the following timings for a 30B LLaMA model:
llama_print_timings: load time = 62270.78 ms
llama_print_timings: sample time = 681.50 ms / 203 runs ( 3.36 ms per run)
llama_print_timings: prompt eval time = 60647.60 ms / 323 tokens ( 187.76 ms per token)
llama_print_timings: eval time = 46631.52 ms / 202 runs ( 230.85 ms per run)
llama_print_timings: total time = 109586.98 ms
Although the time / token was pretty ok, the load time was pretty significant. Do you have similar timings? I have this ran on a M1 Max with 64GB RAM.
Why I asked is because I have not seen such long load time from other discussions and I am wondering if there is something I have missed.
I am using commit d2beca9 if that helps but this long loading time has been here for a while.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have the following timings for a 30B LLaMA model:
Although the time / token was pretty ok, the load time was pretty significant. Do you have similar timings? I have this ran on a M1 Max with 64GB RAM.
Why I asked is because I have not seen such long load time from other discussions and I am wondering if there is something I have missed.
I am using commit d2beca9 if that helps but this long loading time has been here for a while.
Beta Was this translation helpful? Give feedback.
All reactions