Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_proof is not giving the desired feedback #221

Closed
shangetang opened this issue Feb 12, 2025 · 1 comment
Closed

check_proof is not giving the desired feedback #221

shangetang opened this issue Feb 12, 2025 · 1 comment

Comments

@shangetang
Copy link

shangetang commented Feb 12, 2025

Description
I use the new method lean_dojo.check_proof, but it is giving the wrong feedback. When I give the correct proof, it gives "False". But when I use "
sorry" as the proof, it gives "True".

Detailed Steps to Reproduce the Behavior

import os
from lean_dojo import *

repo = LeanGitRepo("https://github.com/wzj423/lean-dojo-mew", "d08b8ba")

file_path = "MiniF2F/Test.lean"
thm_name = "mathd_numbertheory_150"

theorem = Theorem(repo, file_path, thm_name)

step1 = '''by
  have h₁ : 6 ≤ n := by
    -- We use contraposition to prove the statement.
    contrapose! h₀
    -- We check the values of N starting from 1.
    interval_cases n <;> norm_num [Nat.Prime]
  -- We have shown that 6 is the smallest N such that 7 + 30N is not a prime number.
  exact h₁'''



result1 = check_proof(theorem, step1)

print("result1:",result1)`

Logs in Debug Mode
2025-02-11 21:52:09.453 | DEBUG | lean_dojo.data_extraction.lean:<module>:45 - Using GitHub without authentication. Don't be surprised if you hit the API rate limit. 2025-02-11 21:52:10.023 | DEBUG | lean_dojo.data_extraction.lean:_to_commit_hash:195 - Querying the commit hash for lean-dojo-mew d08b8ba 2025-02-11 21:52:10.554 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:219 - The traced repo is available in the cache. 2025-02-11 21:52:12.084 | DEBUG | lean_dojo.interaction.dojo:check_proof:137 - Modifying MiniF2F/Test.leaninto/scratch/gpfs/st3812/.cache/lean_dojo/wzj423-lean-dojo-mew-d08b8ba9bad48a7a6497bad6fde21bace85128e2/lean-dojo-mew/MiniF2F/Testqgflz5h0.lean2025-02-11 21:52:12.086 | DEBUG | lean_dojo.utils:execute:110 - lake env lean --threads=1 --memory=32768 MiniF2F/Testqgflz5h0.lean 2025-02-11 21:52:39.056 | INFO | lean_dojo.utils:execute:115 - MiniF2F/Testqgflz5h0.lean:11:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:16:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:20:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:25:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:31:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:35:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:38:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:43:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:47:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:50:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:55:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:64:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:71:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:77:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:81:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:85:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:89:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:94:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:99:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:103:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:106:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:110:39: warning: unused variableh₀[linter.unusedVariables] MiniF2F/Testqgflz5h0.lean:114:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:118:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:123:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:129:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:133:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:138:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:143:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:147:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:151:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:162:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:165:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:169:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:172:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:201:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:206:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:214:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:218:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:221:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:229:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:234:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:238:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:249:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:253:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:256:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:259:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:262:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:265:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:271:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:276:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:281:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:286:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:296:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:303:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:308:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:313:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:320:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:328:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:332:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:336:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:344:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:354:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:358:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:363:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:368:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:379:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:385:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:389:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:392:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:396:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:400:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:407:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:410:35: warning: unused variableh₀[linter.unusedVariables] MiniF2F/Testqgflz5h0.lean:419:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:423:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:427:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:431:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:435:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:439:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:443:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:446:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:450:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:456:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:487:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:503:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:506:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:510:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:516:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:521:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:525:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:528:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:532:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:536:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:540:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:547:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:557:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:562:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:566:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:571:8: warning: declaration uses 'sorry' MiniF2F/Testqgflz5h0.lean:578:7: warning: unused variableh₃` [linter.unusedVariables]
MiniF2F/Testqgflz5h0.lean:582:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:587:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:590:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:594:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:597:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:601:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:606:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:611:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:614:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:620:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:626:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:630:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:638:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:642:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:646:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:651:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:657:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:664:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:668:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:672:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:677:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:681:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:685:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:696:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:701:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:711:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:716:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:723:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:727:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:731:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:734:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:737:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:742:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:746:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:756:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:760:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:763:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:766:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:770:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:774:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:778:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:783:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:786:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:790:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:795:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:799:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:804:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:808:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:813:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:820:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:825:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:830:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:835:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:854:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:858:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:862:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:867:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:874:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:878:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:888:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:892:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:897:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:900:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:916:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:921:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:925:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:929:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:933:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:938:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:943:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:947:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:952:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:961:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:965:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:969:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:974:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:982:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:988:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:994:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:998:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1002:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1006:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1014:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1021:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1026:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1029:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1033:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1037:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1041:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1046:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1050:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1056:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1066:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1073:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1079:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1085:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1090:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1097:21: error: unsolved goals
case «0»
n : ℕ
h₀ : 0 < 6
⊢ Irreducible 7

case «1»
n : ℕ
h₀ : 1 < 6
⊢ Irreducible 37

case «2»
n : ℕ
h₀ : 2 < 6
⊢ Irreducible 67

case «3»
n : ℕ
h₀ : 3 < 6
⊢ Irreducible 97

case «4»
n : ℕ
h₀ : 4 < 6
⊢ Irreducible 127

case «5»
n : ℕ
h₀ : 5 < 6
⊢ Irreducible 157
MiniF2F/Testqgflz5h0.lean:1107:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1114:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1122:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1129:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1134:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1137:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1142:8: warning: declaration uses 'sorry'
MiniF2F/Testqgflz5h0.lean:1146:8: warning: declaration uses 'sorry'

2025-02-11 21:52:39.058 | ERROR | lean_dojo.utils:execute:116 - warning: manifest out of date: git url of dependency 'mathlib' changed; use lake update mathlib to update it
warning: manifest out of date: git revision of dependency 'mathlib' changed; use lake update mathlib to update it
warning: manifest out of date: git url of dependency 'aesop' changed; use lake update aesop to update it
warning: manifest out of date: git revision of dependency 'aesop' changed; use lake update aesop to update it

result1: False`

For a second example, I replace the step1 to be " sorry", the feedback will be True:

2025-02-11 21:55:26.705 | DEBUG | lean_dojo.data_extraction.lean:<module>:45 - Using GitHub without authentication. Don't be surprised if you hit the API rate limit. 2025-02-11 21:55:27.189 | DEBUG | lean_dojo.data_extraction.lean:_to_commit_hash:195 - Querying the commit hash for lean-dojo-mew d08b8ba 2025-02-11 21:55:27.588 | DEBUG | lean_dojo.data_extraction.trace:get_traced_repo_path:219 - The traced repo is available in the cache. 2025-02-11 21:55:29.110 | DEBUG | lean_dojo.interaction.dojo:check_proof:137 - Modifying MiniF2F/Test.leaninto/scratch/gpfs/st3812/.cache/lean_dojo/wzj423-lean-dojo-mew-d08b8ba9bad48a7a6497bad6fde21bace85128e2/lean-dojo-mew/MiniF2F/Test0a8mtg2g.lean 2025-02-11 21:55:29.111 | DEBUG | lean_dojo.utils:execute:110 - lake env lean --threads=1 --memory=32768 MiniF2F/Test0a8mtg2g.lean result2: True

@yangky11
Copy link
Member

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants