From c0a1569b8ef5708093cef091d8eb2b993067d4f1 Mon Sep 17 00:00:00 2001
From: Jan <jakel101@hhu.de>
Date: Mon, 4 Mar 2024 21:27:31 +0100
Subject: [PATCH] Add documentation

---
 README.md      |  1 +
 docs/README.md | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)
diff --git a/README.md b/README.md
index 07250453a..1a920cea6 100644
--- a/README.md
+++ b/README.md
@@ -38,6 +38,7 @@ Below are the features and tasks of this framework:
     - [SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task) for evaluating FIM on **Python** code using Exact Match. Further details are described in [SantaCoder](https://arxiv.org/abs/2301.03988). Includes two tasks:
         - `StarCoderFIM`: which uses the default FIM tokens `"<fim_prefix>", "<fim_middle>", "<fim_suffix>"`, and
         - `SantaCoderFIM`: which uses SantaCoder FIM tokens `"<fim-prefix>", "<fim-middle>", "<fim-suffix>"`
+    - Shadereval for **GLSL** code understanding ([task1](https://huggingface.co/spaces/Vipitis/ShaderEval)) and generation ([task2](https://huggingface.co/spaces/Vipitis/shadermatch))
 
 More details about each task can be found in  the documentation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md).
 ## Setup
diff --git a/docs/README.md b/docs/README.md
index 9043a6c4c..d1a502130 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -382,6 +382,46 @@ accelerate launch  main.py \
   --allow_code_execution
 ```
 
+### Shadereval 
+[Shadereval](tbd.) explores "creative" code generation. Fragment shaders are sourced from Shadertoy.com and curated into the [Shadertoys](https://huggingface.co/datasets/Vipitis/Shadertoys) dataset. The task specific datasets are build from the Shadertoys dataset and therefore share a common train/test split.
+
+Task-1: **ReturnCompletion** provides a function header and body, so the model generates a matching return statement. Generations are evaluated by `exact-match` therefore does not require code execution. The original publication uses greedy decoding and only 300 samples.
+
+```bash
+accelerate launch main.py \ 
+  --model <MODEL_NAME> \ 
+  --tasks shadereval-1 \ 
+  --n_samples 300 \
+  --do_sample False \
+```
+
+Task-2: **FunctionGeneration** parses comments directly before or after the function header as model input. The model is expected to generate a complete function that is syntactially sound. Generated functions are inserted in the original shader program for evaluation. A custom metric is hosted in the [demo space](https://huggingface.co/spaces/Vipitis/shadermatch) which render frames to compare. This requires an additional dependency [wgpu-shadertoy](https://github.com/pygfx/shadertoy). It's recommended to generate generations first and then evaluate them later.
+The reference uses greedy decoding and fp16 for the first 300 examples.
+
+```bash
+accelerate launch main.py \ 
+  --model <MODEL_NAME> \ 
+  --tasks shadereval-2 \ 
+  --generation_only \ 
+  --save_generations_path "saved_generations.json" \ 
+  --allow_code_execution \ 
+  --limit 300 \ 
+  --do_sample False \ --precision fp16 \ 
+```
+
+To evaluate later run the following command:
+
+```bash
+accelerate launch main.py \ 
+  --model <MODEL_NAME> \ 
+  --tasks shadereval-2 \ 
+  --load_generations_path "saved_generations.json" \ 
+  --allow_code_execution \ 
+  --limit 300 \ 
+  --metric_output_path "eval_results.json" \ 
+  --precision fp16
+```
+
 
 ## Code generation benchmarks without unit tests