knights_knaves #196

vncntt · 2025-02-24T09:50:17Z

haven't implemented unit tests yet
would want someone to check the score_answer function.

        format = ", ".join(f"{name} is a {knight_knave['knight']}/{knight_knave['knave']}" for name in names[:-1])
        if len(names) > 1:
            format += f", and {names[-1]} is a {knight_knave['knight']}/{knight_knave['knave']}"
        else:
            format = f"{names[0]} is a {knight_knave['knight']}/{knight_knave['knave']}"

        text += f' (Format your answer like: "{format}")'

    def _normalize_answer(self, answer: str) -> set[tuple[str, str]]:
        """Convert answer string into normalized set of (name, role) tuples"""
        # Remove common punctuation and standardize spacing
        answer = answer.lower().strip().replace(".", "").replace(",", "")

        # Split on 'and' or spaces for different formats
        parts = [p.strip() for p in answer.replace(" and ", " ").split()]

        # Extract name-role pairs
        assignments = set()
        current_name = None

        for part in parts:
            if part in ["is", "a"]:
                continue
            if part in ["knight", "knave"]:
                if current_name:
                    assignments.add((current_name, part))
                    current_name = None
            else:
                current_name = part

        return assignments

    def score_answer(self, answer: Optional[str], entry: dict[str, Any]) -> float:
        """Score an answer against the oracle answer."""
        if answer is None or len(answer) == 0:
            return 0.0

        try:
            oracle_assignments = self._normalize_answer(entry["answer"])
            answer_assignments = self._normalize_answer(answer)

            # Full credit for exact assignments regardless of order
            if oracle_assignments == answer_assignments:
                return 1.0

            # Partial credit if all names are present and some assignments match
            if len(oracle_assignments) == len(answer_assignments):
                matching = len(oracle_assignments.intersection(answer_assignments))
                if matching > 0:
                    return 0.3 + (0.7 * matching / len(oracle_assignments))

            return 0.01

        except Exception:
            # If parsing fails, give minimal credit
            return 0.01

reasoning_gym/logic/knights_knaves.py

vncntt · 2025-02-25T05:00:08Z

r1 results look good.

am i supposed to remove the r1 test results from the commit?

andreaskoepf · 2025-02-25T09:17:05Z

@vncntt still need to add info to this repo.. our eval repo where we collect the results is here: https://github.com/open-thought/reasoning-gym-eval/
Please remove if the r1 eval result from this PR and if you like submit it as separate PR to reasoning-gym-eval.

vncntt · 2025-02-25T14:55:04Z

@andreaskoepf should be mostly good? i copy some code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py so we might have to cite. not sure how licenses work

andreaskoepf

Thanks a lot! :-)

andreaskoepf marked this pull request as draft February 24, 2025 09:57

andreaskoepf reviewed Feb 24, 2025

View reviewed changes

reasoning_gym/logic/knights_knaves.py Outdated Show resolved Hide resolved

andreaskoepf reviewed Feb 24, 2025

View reviewed changes

reasoning_gym/logic/knights_knaves.py Show resolved Hide resolved

andreaskoepf marked this pull request as ready for review February 25, 2025 18:42

vncntt added 5 commits February 25, 2025 19:54

knights_knaves

c682d1e

fix scoring bug

a8224d4

unit tests + r1 eval

e12242d

remove r1 tests

5e1d0c0

fix test.yaml

66c10c0

andreaskoepf force-pushed the knights-knaves branch from 3d0e803 to 66c10c0 Compare February 25, 2025 18:54

andreaskoepf added 2 commits February 25, 2025 20:01

docs: Add attribution for Knights and Knaves dataset implementation

b8e7d54

style: Format citation in NOTICE.txt for consistent spacing

58015b6

andreaskoepf enabled auto-merge (squash) February 25, 2025 19:07

andreaskoepf added 2 commits February 25, 2025 20:08

remove print statement

bc270e6

minor formatting

8d5f6b1

andreaskoepf approved these changes Feb 25, 2025

View reviewed changes

andreaskoepf merged commit 5f01049 into open-thought:main Feb 25, 2025
3 checks passed

andreaskoepf linked an issue Feb 25, 2025 that may be closed by this pull request

Add Knights and Knaves puzzle dataset #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

knights_knaves #196

knights_knaves #196

vncntt commented Feb 24, 2025 •

edited

Loading

vncntt commented Feb 25, 2025 •

edited

Loading

andreaskoepf commented Feb 25, 2025 •

edited

Loading

vncntt commented Feb 25, 2025

andreaskoepf left a comment

knights_knaves #196

knights_knaves #196

Conversation

vncntt commented Feb 24, 2025 • edited Loading

vncntt commented Feb 25, 2025 • edited Loading

andreaskoepf commented Feb 25, 2025 • edited Loading

vncntt commented Feb 25, 2025

andreaskoepf left a comment

Choose a reason for hiding this comment

vncntt commented Feb 24, 2025 •

edited

Loading

vncntt commented Feb 25, 2025 •

edited

Loading

andreaskoepf commented Feb 25, 2025 •

edited

Loading