inference

History

Name		Name	Last commit message	Last commit date
parent directory ..
compression		compression
efficiency		efficiency
eval		eval
NOTES.md		NOTES.md
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

README.md

LLM Inference

Compress

Run the following:

bash compression/run_olmo_quantization.sh /net/nfs.cirrascale/allennlp/akshitab/olmo-models/olmo-1b quantized_olmo-1b

Run accuracy benchmark

Run the following:

bash eval/mmlu/eval_on_mmlu.sh quantized_olmo-1b /net/nfs.cirrascale/allennlp/akshitab/olmo-models/olmo-1b /net/nfs.cirrascale/allennlp/akshitab/data/mmlu eval_results

Output format:

Average accuracy 0.202 - math
Average accuracy 0.232 - health
Average accuracy 0.219 - physics
Average accuracy 0.270 - business
Average accuracy 0.198 - biology
Average accuracy 0.172 - chemistry
Average accuracy 0.267 - computer science
Average accuracy 0.204 - economics
Average accuracy 0.234 - engineering
Average accuracy 0.238 - philosophy
Average accuracy 0.236 - other
Average accuracy 0.233 - history
Average accuracy 0.177 - geography
Average accuracy 0.204 - politics
Average accuracy 0.225 - psychology
Average accuracy 0.250 - culture
Average accuracy 0.250 - law
Average accuracy 0.212 - STEM
Average accuracy 0.241 - humanities
Average accuracy 0.215 - social sciences
Average accuracy 0.238 - other (business, health, misc.)
Average accuracy: 0.229

Run efficiency benchmark

Run the following:

cd efficiency
bash run_olmo_efficiency_benchmark.sh /net/nfs.cirrascale/allennlp/akshitab/olmo-models/olmo-1b quantized_olmo-1b

Output format:

Time Elapsed: 500.91 s
Max GPU memory usage:  2.09 GiB.
Average GPU power:  9.00e+01 W.
Average power:  2.04e+02 W.
Total energy:  7.49e-02 kWh.
CO2 emission:  6.35e-03 kg.
Throughput:  0.20 instances / s.
Throughput:  47.30 words / s.
Latency:  5009.10 ms / batch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

inference

inference

README.md

LLM Inference

Compress

Run accuracy benchmark

Run efficiency benchmark

Files

inference

Directory actions

More options

Directory actions

More options

Latest commit

History

inference

Folders and files

parent directory

README.md

LLM Inference

Compress

Run accuracy benchmark

Run efficiency benchmark