fix #2

hughbzhang · 2024-07-05T20:49:03Z

No description provided.

Updating docs hyperlinks

# Conflicts: # README.md

Fiddling with READMEs, Reenable CI tests on `main`

BBH cot fewshot already has fewshot examples in the description. So num_fewshot needs to be set to 0 so that users won't mistakenly set other num_fewshot values.

Update _cot_fewshot_template_yaml

…rness into patch-scrolls

…489) * model_type attribute error Getting attribute error when using a model without a 'model_type' * fix w/ and w/out the 'model_type' specification * use getattr(), also fix other config.model_type reference * Update huggingface.py --------- Co-authored-by: Hailey Schoelkopf <[email protected]>

* add undistribute + use more_itertools * remove divide() util fn * add more_itertools as dependency

* make `WandbLogger` init args optional * nit * nit * nit * move import warning to `WandbLogger` * nit * update docs * nit

* use `@ray.remote` with distributed vLLM * update versions * bugfix * unpin vllm * fix pre-commit * added version assertion error * Revert "added version assertion error" This reverts commit 8041e9b78e95eea9f4f4d0dc260115ba8698e9cc. * added version assertion for DP * expand DP note * add warning * nit * pin vllm * fix typos

…ity (#1487) * setting trust_remote_code * dataset list no notebooks * respect trust remote code * Address changes, move cli options and change datasets * fix task for tests * headqa * remove kobest * pin datasets and address comments * clean up space

* add french-bench * rename arc easy * linting * update datasets for no remote code exec * fix string delimiter * add info to readmr * trim trailing whitespace * add detailed groups * add info to readme * remove orangesum title from fbench main * Force PPL tasks to be 0-shot --------- Co-authored-by: Hailey Schoelkopf <[email protected]>

* Fix padding * Fix elif in model loading * format

* Add new tasks of GPQA * Add README * Remove unused functions * Remove unused functions * Linters * Add flexible match * update * Remove deplicate function * Linter * update * Update lm_eval/filters/extraction.py Co-authored-by: Hailey Schoelkopf <[email protected]> * register multi_choice_regex * Update * run precommit --------- Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]>

* Start adding eq-bench * Start adding to yaml and utils * Get metric working * Add README * Handle cases where answer is not parseable * Deal with unparseable answers and add percent_parseable metric * Update README

* init wmdp yaml file * Add WMDP Multiple-choice * fix linter issues * Delete lm_eval/tasks/wmdp/_wmdp.yaml --------- Co-authored-by: Lintang Sutawika <[email protected]>

…used by cot which hardcodes fewshot prompt (#1502)

…533) * Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided) * Fix improper import of LM and usage of evaluator in one of scripts * update type hints in instance and task api * raising errors in task.py instead of asserts * Fix warnings from ruff * raising errors in __main__.py instead of asserts * raising errors in tasks/__init__.py instead of asserts * raising errors in evaluator.py instead of asserts * evaluator: update type hints and remove unused variables in code * Update lm_eval/__main__.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/__main__.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/api/task.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/api/task.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/api/task.py Co-authored-by: Hailey Schoelkopf <[email protected]> * Update lm_eval/evaluator.py Co-authored-by: Hailey Schoelkopf <[email protected]> * pre-commit induced fixes --------- Co-authored-by: Hailey Schoelkopf <[email protected]>

…g document and, update wandb_args description (#1536) * Update openai completions and docs/CONTRIBUTING.md * Update wandb args description * Update docs/interface.md --------- Co-authored-by: Hailey Schoelkopf <[email protected]>

lintangsutawika and others added 30 commits December 4, 2023 23:40

Merge pull request #1064 from EleutherAI/haileyschoelkopf-patch-3

6f76ee0

Update new_task_guide.md

f73c2bc

Update model_guide.md

d83fc51

Merge pull request #1066 from EleutherAI/haileyschoelkopf-patch-4

b957a08

Updating docs hyperlinks

run CI on main branch PRs

eeb9972

Unit tests run on main branch

f74c0fe

Merge branch 'main' into big-refactor_dp

e86e7b2

# Conflicts: # README.md

typo

19f745a

update README.md

f721c0f

Merge branch 'main' into haileyschoelkopf-patch-2

f7fb109

added _encode_pair

dc1c816

Merge pull request #1063 from EleutherAI/haileyschoelkopf-patch-2

f0b9649

Fiddling with READMEs, Reenable CI tests on `main`

add script to check vllm, hf equiv

f44aa85

nits

a4188e1

fix errors

b99ad79

fix z-score and print; rename script

8b74bea

nits

38e3b73

fix batch

1c62da1

Update _cot_fewshot_template_yaml

a6d28ea

BBH cot fewshot already has fewshot examples in the description. So num_fewshot needs to be set to 0 so that users won't mistakenly set other num_fewshot values.

Update _mmlu_flan_cot_fewshot_template_yaml

4e34a6e

Update _mmlu_flan_cot_zeroshot_template_yaml

8c05c6b

Update _zeroshot_template_yaml

361ba19

Update _fewshot_template_yaml

a2cc877

Update _cot_zeroshot_template_yaml

ba7ba91

Update minerva_math_algebra.yaml

ce079c9

formatting

965c533

Merge pull request #1074 from EleutherAI/lintangsutawika-patch-4

e5dfd03

Update _cot_fewshot_template_yaml

fixed enumeration

12f260c

Merge branch 'main' of https://github.com/EleutherAI/lm-evaluation-ha…

eb834c9

…rness into patch-scrolls

implementing kmmlu

1b14602

richwardle and others added 25 commits February 27, 2024 18:31

fix duplicated kwargs in some model init (#1495)

b177c82

Add multilingual truthfulqa targets (#1499)

d272c19

always include EOS token in stopsequences if possible (#1480)

284dd80

Improve data-parallel request partitioning for VLLM (#1477)

27a3da9

* add undistribute + use more_itertools * remove divide() util fn * add more_itertools as dependency

modify WandbLogger to accept arbitrary kwargs (#1491)

ae79b12

* make `WandbLogger` init args optional * nit * nit * nit * move import warning to `WandbLogger` * nit * update docs * nit

Cleaning up unused unit tests (#1516)

4eba9cf

Hotfix: fix TypeError in --trust_remote_code (#1517)

4582391

Fix minor edge cases (#951 #1503) (#1520)

292e581

* Fix padding * Fix elif in model loading * format

Openllm benchmark (#1526)

8a875e9

Add EQ-Bench as per #1459 (#1511)

c5acce0

* Start adding eq-bench * Start adding to yaml and utils * Get metric working * Add README * Handle cases where answer is not parseable * Deal with unparseable answers and add percent_parseable metric * Update README

Add WMDP Multiple-choice (#1534)

29b2b01

* init wmdp yaml file * Add WMDP Multiple-choice * fix linter issues * Delete lm_eval/tasks/wmdp/_wmdp.yaml --------- Co-authored-by: Lintang Sutawika <[email protected]>

Adding new task : KorMedMCQA (#1530)

faee1ad

Update docs on LM.loglikelihood_rolling abstract method (#1532)

525b8f5

update printed num-fewshot ; prevent fewshots from erroneously being …

0270505

…used by cot which hardcodes fewshot prompt (#1502)

GSM1k evaluation code

6b37263

Installation + renaming

cede055

LICENSE fix

c5c2ecc

fix

5198b8b

hughbzhang self-assigned this Jul 5, 2024

jackvaughanjr unassigned hughbzhang Nov 23, 2024

jeff-da closed this Nov 26, 2024

jeff-da force-pushed the main branch from d222a78 to 39294c6 Compare November 26, 2024 01:56

jeff-da deleted the fix branch November 26, 2024 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #2

fix #2

hughbzhang commented Jul 5, 2024

fix #2

fix #2

Conversation

hughbzhang commented Jul 5, 2024