Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knights_knaves #196

Merged
merged 9 commits into from
Feb 25, 2025
Merged

knights_knaves #196

merged 9 commits into from
Feb 25, 2025

Conversation

vncntt
Copy link
Contributor

@vncntt vncntt commented Feb 24, 2025

for #187

  • haven't implemented unit tests yet
  • would want someone to check the score_answer function.
        format = ", ".join(f"{name} is a {knight_knave['knight']}/{knight_knave['knave']}" for name in names[:-1])
        if len(names) > 1:
            format += f", and {names[-1]} is a {knight_knave['knight']}/{knight_knave['knave']}"
        else:
            format = f"{names[0]} is a {knight_knave['knight']}/{knight_knave['knave']}"

        text += f' (Format your answer like: "{format}")'
    def _normalize_answer(self, answer: str) -> set[tuple[str, str]]:
        """Convert answer string into normalized set of (name, role) tuples"""
        # Remove common punctuation and standardize spacing
        answer = answer.lower().strip().replace(".", "").replace(",", "")

        # Split on 'and' or spaces for different formats
        parts = [p.strip() for p in answer.replace(" and ", " ").split()]

        # Extract name-role pairs
        assignments = set()
        current_name = None

        for part in parts:
            if part in ["is", "a"]:
                continue
            if part in ["knight", "knave"]:
                if current_name:
                    assignments.add((current_name, part))
                    current_name = None
            else:
                current_name = part

        return assignments

    def score_answer(self, answer: Optional[str], entry: dict[str, Any]) -> float:
        """Score an answer against the oracle answer."""
        if answer is None or len(answer) == 0:
            return 0.0

        try:
            oracle_assignments = self._normalize_answer(entry["answer"])
            answer_assignments = self._normalize_answer(answer)

            # Full credit for exact assignments regardless of order
            if oracle_assignments == answer_assignments:
                return 1.0

            # Partial credit if all names are present and some assignments match
            if len(oracle_assignments) == len(answer_assignments):
                matching = len(oracle_assignments.intersection(answer_assignments))
                if matching > 0:
                    return 0.3 + (0.7 * matching / len(oracle_assignments))

            return 0.01

        except Exception:
            # If parsing fails, give minimal credit
            return 0.01

@andreaskoepf andreaskoepf marked this pull request as draft February 24, 2025 09:57
@vncntt
Copy link
Contributor Author

vncntt commented Feb 25, 2025

r1 results look good.

am i supposed to remove the r1 test results from the commit?

@andreaskoepf
Copy link
Contributor

andreaskoepf commented Feb 25, 2025

@vncntt still need to add info to this repo.. our eval repo where we collect the results is here: https://github.com/open-thought/reasoning-gym-eval/
Please remove if the r1 eval result from this PR and if you like submit it as separate PR to reasoning-gym-eval.

@vncntt
Copy link
Contributor Author

vncntt commented Feb 25, 2025

@andreaskoepf should be mostly good? i copy some code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py so we might have to cite. not sure how licenses work

@andreaskoepf andreaskoepf marked this pull request as ready for review February 25, 2025 18:42
@andreaskoepf andreaskoepf enabled auto-merge (squash) February 25, 2025 19:07
Copy link
Contributor

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! :-)

@andreaskoepf andreaskoepf merged commit 5f01049 into open-thought:main Feb 25, 2025
3 checks passed
@andreaskoepf andreaskoepf linked an issue Feb 25, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Knights and Knaves puzzle dataset
2 participants