Add documentation

bigcode-project · Mar 4, 2024 · c0a1569 · c0a1569
1 parent f1b6c5a
commit c0a1569
Show file tree

Hide file tree

Showing 2 changed files with 41 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -38,6 +38,7 @@ Below are the features and tasks of this framework:
     - [SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task) for evaluating FIM on **Python** code using Exact Match. Further details are described in [SantaCoder](https://arxiv.org/abs/2301.03988). Includes two tasks:
         - `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
         - `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
+    - Shadereval for **GLSL** code understanding ([task1](https://huggingface.co/spaces/Vipitis/ShaderEval)) and generation ([task2](https://huggingface.co/spaces/Vipitis/shadermatch))
 
 More details about each task can be found in  the documentation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md).
 ## Setup

diff --git a/docs/README.md b/docs/README.md
@@ -382,6 +382,46 @@ accelerate launch  main.py \
   --allow_code_execution
 ```
 
+### Shadereval 
+[Shadereval](tbd.) explores "creative" code generation. Fragment shaders are sourced from Shadertoy.com and curated into the [Shadertoys](https://huggingface.co/datasets/Vipitis/Shadertoys) dataset. The task specific datasets are build from the Shadertoys dataset and therefore share a common train/test split.
+
+Task-1: **ReturnCompletion** provides a function header and body, so the model generates a matching return statement. Generations are evaluated by `exact-match` therefore does not require code execution. The original publication uses greedy decoding and only 300 samples.
+
+```bash
+accelerate launch main.py \ 
+  --model <MODEL_NAME> \ 
+  --tasks shadereval-1 \ 
+  --n_samples 300 \
+  --do_sample False \
+```
+
+Task-2: **FunctionGeneration** parses comments directly before or after the function header as model input. The model is expected to generate a complete function that is syntactially sound. Generated functions are inserted in the original shader program for evaluation. A custom metric is hosted in the [demo space](https://huggingface.co/spaces/Vipitis/shadermatch) which render frames to compare. This requires an additional dependency [wgpu-shadertoy](https://github.com/pygfx/shadertoy). It's recommended to generate generations first and then evaluate them later.
+The reference uses greedy decoding and fp16 for the first 300 examples.
+
+```bash
+accelerate launch main.py \ 
+  --model <MODEL_NAME> \ 
+  --tasks shadereval-2 \ 
+  --generation_only \ 
+  --save_generations_path "saved_generations.json" \ 
+  --allow_code_execution \ 
+  --limit 300 \ 
+  --do_sample False \ --precision fp16 \ 
+```
+
+To evaluate later run the following command:
+
+```bash
+accelerate launch main.py \ 
+  --model <MODEL_NAME> \ 
+  --tasks shadereval-2 \ 
+  --load_generations_path "saved_generations.json" \ 
+  --allow_code_execution \ 
+  --limit 300 \ 
+  --metric_output_path "eval_results.json" \ 
+  --precision fp16
+```
+
 
 ## Code generation benchmarks without unit tests