Skip to content

Commit

Permalink
No public description
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 647318172
  • Loading branch information
minsukkahng authored and RyanMullins committed Jun 27, 2024
1 parent 3729e3d commit 12d9424
Showing 1 changed file with 10 additions and 8 deletions.
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# LLM Comparator

LLM Comparator is a python library and interactive visualization tool for
LLM Comparator is an interactive visualization tool with a python library, for
analyzing side-by-side LLM evaluation results.
It is designed to help people qualitatively analyze how
responses from two models differ at example- and slice-levels. Users can
Expand Down Expand Up @@ -35,8 +35,8 @@ The tool helps you analyze *when* and *why* Gemma 1.1 is better or worse than

- ***When***: The **Score Distribution** and **Metrics by Prompt Category**
panels show that the quality of responses from Model A (Gemma 1.1) is considered
better than that from Model B (Gemma 1.0) (larger blue area than orange;
>50% win rate), according to the LLM-based evaluation method
better than that from Model B (Gemma 1.0) (larger blue area than orange; >50%
win rate), according to the LLM-based evaluation method
([LLM-as-a-judge](https://arxiv.org/abs/2306.05685)).
This holds true for most prompt categories (e.g., Humanities, Math).
- ***Why***: The **Rationale Summary** panel dives into the reasons behind these
Expand All @@ -53,12 +53,14 @@ from Gemma 1.1 starts with it.

## Python Library for Creating JSON File

This project provides the `llm-comparator` package on PyPI, which create JSON
This project provides the `llm-comparator` package on PyPI, which create JSON
files for use with the LLM Comparator visualization. This package can create the
entire JSON file, including side-by-side analysis, given a set of input prompts
to run and models to run them on. Or, if a user already has prompts and an
existing set of model outputs, it can perform just the side-by-side analysis
steps. For more details, see the [Python library README](python/README.md).
entire JSON file, including side-by-side LLM-based evaluation and rationale
clusters, given a set of input prompts to run and models to run them on. Or, if
a user already has prompts and an existing set of model outputs, it can perform
just the rationale clustering steps. For more details, see the
[Python library README](python/README.md).


## JSON Data Format

Expand Down

0 comments on commit 12d9424

Please sign in to comment.