Skip to content

Commit

Permalink
updated config and read me
Browse files Browse the repository at this point in the history
  • Loading branch information
joesharratt1229 committed Feb 25, 2025
1 parent 7b39f4a commit 52c3c43
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 6 deletions.
44 changes: 40 additions & 4 deletions eval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,8 @@ dataset_seed: 42
developer_role: system

```
For example the following file will run an evaluation for deepseek r1 for algorithmic datasets
```yaml
For example the following file will run an evaluation for deepseek r1 for algorithmic datasets.
``` yaml
model: deepseek/deepseek-r1
category: algorithmic
datasets:
Expand Down Expand Up @@ -84,7 +83,44 @@ eval_dir: eval/r1
dataset_size: 50
dataset_seed: 45
developer_role: system

```
The following would run Claude 3.5 on the algorithmic dataset.
```yaml
model: anthropic/claude-3.5-sonnet
category: algorithmic
provider: Anthropic
datasets:
- count_primes
- game_of_life
- graph_color
- group_anagrams
- isomorphic_strings
- letter_counting
- letter_jumble
- manipulate_matrix
- number_filtering
- number_sorting
- palindrome
- pool_matrix
- ransom_note
- rotate_matrix
- sentence_reordering
- spell_backward
- spiral_matrix
- string_insertion
- string_manipulation
- string_synthesis
- word_ladder
- word_sequence_reversal
- word_sorting
eval_dir: eval/r1
dataset_size: 50
dataset_seed: 45
developer_role: system
```
Here you specify individual model and provider
### Running Evaluations
Expand All @@ -98,4 +134,4 @@ python eval.py --yaml yaml/algorithmic.yaml
```


The results of your model run on a dataset will be stored in a new folder in the directory E.g `r1/algorithmic/proposition_logic.json`
The results of individual model on a dataset will be stored in a new folder in the directory E.g `r1/algorithmic/proposition_logic.json`
4 changes: 2 additions & 2 deletions eval/eval_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ class EvalConfig:
eval_dir: str
dataset_size: int
dataset_seed: int
model: str = "deepseek/deepseek-r1"
provider: str = "Nebius"
model: str
provider: str
developer_role: str = "system"
developer_prompt: str = SYSTEM_PROMPTS["DeepSeekZero"]

Expand Down

0 comments on commit 52c3c43

Please sign in to comment.