Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result of Localization #21

Open
GCVulnerability opened this issue Aug 13, 2024 · 8 comments
Open

Result of Localization #21

GCVulnerability opened this issue Aug 13, 2024 · 8 comments

Comments

@GCVulnerability
Copy link

Hi, Agentless is an amazing work. I notice that '% Correct Location' is mentioned in the paper.
I'm really interested in the Fault Location of SWE-bench.
So can you please provide the ground truth of SWE-bench lite and the evaluation code?

@brutalsavage
Copy link
Contributor

Thanks for the question, we will release that soon!

@yorhaha
Copy link

yorhaha commented Aug 20, 2024

Thanks for the question, we will release that soon!

Could you please give an approximate release time? Otherwise, we will consider implementing our own evaluation code. But this may lead to differences in our results.

Thanks for your work.

@yorhaha
Copy link

yorhaha commented Aug 20, 2024

Also, may I ask if you calculate recall value based on the final generated patch and the ground truth patch? (without considering the code retrieved during the intermediate process before generating the patch)

@brutalsavage
Copy link
Contributor

Could you please give an approximate release time?

Our hope is sometime this week or early next week

calculate recall value

Not totally sure what you mean, can you please elaborate a bit more?

@yorhaha
Copy link

yorhaha commented Aug 21, 2024

By recall value (used in SWE-bench paper), I want to mean "% Correct Location" in your paper. But after reading your paper carefully, now I think the two concepts are different.

Recall value measures the performance of RAG in SWE-bench paper. I am confused by the meaning of "% Correct Location", which encourages more code changes (to cover ground truth patch)?

@brutalsavage
Copy link
Contributor

right so in our paper "% Correct Location" measure the percentage of time the patch edits the location as the groundtruth developer patch. We count it as the correct location if the patch edits a superset of all the locations. For example if its the function granularity, if a patch edits func1 and func2 but the groundtruth patch edits only func1 we still count it as correct. You can see Section 3 in the paper for more detail

@yorhaha
Copy link

yorhaha commented Aug 21, 2024

Thanks for your explanation! I have got it.

@UniverseFly
Copy link

Any updates for the eval of fault location accuracy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants