Skip to content

Commit

Permalink
Add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Vipitis committed Mar 4, 2024
1 parent f1b6c5a commit c0a1569
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Below are the features and tasks of this framework:
- [SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task) for evaluating FIM on **Python** code using Exact Match. Further details are described in [SantaCoder](https://arxiv.org/abs/2301.03988). Includes two tasks:
- `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
- `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
- Shadereval for **GLSL** code understanding ([task1](https://huggingface.co/spaces/Vipitis/ShaderEval)) and generation ([task2](https://huggingface.co/spaces/Vipitis/shadermatch))

More details about each task can be found in the documentation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md).
## Setup
Expand Down
40 changes: 40 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,46 @@ accelerate launch main.py \
--allow_code_execution
```

### Shadereval
[Shadereval](tbd.) explores "creative" code generation. Fragment shaders are sourced from Shadertoy.com and curated into the [Shadertoys](https://huggingface.co/datasets/Vipitis/Shadertoys) dataset. The task specific datasets are build from the Shadertoys dataset and therefore share a common train/test split.

Task-1: **ReturnCompletion** provides a function header and body, so the model generates a matching return statement. Generations are evaluated by `exact-match` therefore does not require code execution. The original publication uses greedy decoding and only 300 samples.

```bash
accelerate launch main.py \
--model <MODEL_NAME> \
--tasks shadereval-1 \
--n_samples 300 \
--do_sample False \
```

Task-2: **FunctionGeneration** parses comments directly before or after the function header as model input. The model is expected to generate a complete function that is syntactially sound. Generated functions are inserted in the original shader program for evaluation. A custom metric is hosted in the [demo space](https://huggingface.co/spaces/Vipitis/shadermatch) which render frames to compare. This requires an additional dependency [wgpu-shadertoy](https://github.com/pygfx/shadertoy). It's recommended to generate generations first and then evaluate them later.
The reference uses greedy decoding and fp16 for the first 300 examples.

```bash
accelerate launch main.py \
--model <MODEL_NAME> \
--tasks shadereval-2 \
--generation_only \
--save_generations_path "saved_generations.json" \
--allow_code_execution \
--limit 300 \
--do_sample False \ --precision fp16 \
```

To evaluate later run the following command:

```bash
accelerate launch main.py \
--model <MODEL_NAME> \
--tasks shadereval-2 \
--load_generations_path "saved_generations.json" \
--allow_code_execution \
--limit 300 \
--metric_output_path "eval_results.json" \
--precision fp16
```


## Code generation benchmarks without unit tests

Expand Down

0 comments on commit c0a1569

Please sign in to comment.