From c0a1569b8ef5708093cef091d8eb2b993067d4f1 Mon Sep 17 00:00:00 2001 From: Jan Date: Mon, 4 Mar 2024 21:27:31 +0100 Subject: [PATCH] Add documentation --- README.md | 1 + docs/README.md | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/README.md b/README.md index 07250453a..1a920cea6 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,7 @@ Below are the features and tasks of this framework: - [SantaCoder-FIM](https://huggingface.co/datasets/bigcode/santacoder-fim-task) for evaluating FIM on **Python** code using Exact Match. Further details are described in [SantaCoder](https://arxiv.org/abs/2301.03988). Includes two tasks: - `StarCoderFIM`: which uses the default FIM tokens `"", "", ""`, and - `SantaCoderFIM`: which uses SantaCoder FIM tokens `"", "", ""` + - Shadereval for **GLSL** code understanding ([task1](https://huggingface.co/spaces/Vipitis/ShaderEval)) and generation ([task2](https://huggingface.co/spaces/Vipitis/shadermatch)) More details about each task can be found in the documentation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md). ## Setup diff --git a/docs/README.md b/docs/README.md index 9043a6c4c..d1a502130 100644 --- a/docs/README.md +++ b/docs/README.md @@ -382,6 +382,46 @@ accelerate launch main.py \ --allow_code_execution ``` +### Shadereval +[Shadereval](tbd.) explores "creative" code generation. Fragment shaders are sourced from Shadertoy.com and curated into the [Shadertoys](https://huggingface.co/datasets/Vipitis/Shadertoys) dataset. The task specific datasets are build from the Shadertoys dataset and therefore share a common train/test split. + +Task-1: **ReturnCompletion** provides a function header and body, so the model generates a matching return statement. Generations are evaluated by `exact-match` therefore does not require code execution. The original publication uses greedy decoding and only 300 samples. + +```bash +accelerate launch main.py \ + --model \ + --tasks shadereval-1 \ + --n_samples 300 \ + --do_sample False \ +``` + +Task-2: **FunctionGeneration** parses comments directly before or after the function header as model input. The model is expected to generate a complete function that is syntactially sound. Generated functions are inserted in the original shader program for evaluation. A custom metric is hosted in the [demo space](https://huggingface.co/spaces/Vipitis/shadermatch) which render frames to compare. This requires an additional dependency [wgpu-shadertoy](https://github.com/pygfx/shadertoy). It's recommended to generate generations first and then evaluate them later. +The reference uses greedy decoding and fp16 for the first 300 examples. + +```bash +accelerate launch main.py \ + --model \ + --tasks shadereval-2 \ + --generation_only \ + --save_generations_path "saved_generations.json" \ + --allow_code_execution \ + --limit 300 \ + --do_sample False \ --precision fp16 \ +``` + +To evaluate later run the following command: + +```bash +accelerate launch main.py \ + --model \ + --tasks shadereval-2 \ + --load_generations_path "saved_generations.json" \ + --allow_code_execution \ + --limit 300 \ + --metric_output_path "eval_results.json" \ + --precision fp16 +``` + ## Code generation benchmarks without unit tests