-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validator role #159
Comments
Just some initial thoughts:
What's the motivation behind having validators improve submissions by wardens? I find it strange that validators can modify findings such that it affects the final payout distribution amongst wardens. Let's say a validator builds on a report to change it from invalid to valid. Wouldn't this then be unfair to other wardens who identified valid findings by themselves? Additionally, the warden who submitted the original report is now paid for submitting an invalid report, which I find strange. Also, what's the difference between valid/invalid and unsatisfactory/satisfactory?
Isn't this a conflict of interest? The judge can improve a submission as a validator and then give himself a 75% cut. |
Adding docs as relevant info: |
I am reposting a revision of feedback that was previously shared privately in the General FeedbackThere's definitely a lot to digest with the latest Code4rena Spring Update, specifically around the new Validator role and how it functions within the overall contest flow. As preliminary feedback, I believe that the changes introduced in the Spring Update are substantial and therefore merit a gradual roll-out via "experimental" contests before the latest changes are applied to all upcoming contests. Validator % CaptureProblem StatementMultiple safeguards must be set in place for this feature to function as Code4rena originally envisioned. Submissions of a warden having their proceeds redirected based on non-quantitative measurements (i.e. increase of quality, introduction of PoC that may not be needed, etc.) is bound to cause fierce debates, especially at the current level of granularity proposed (25%, 50%, 75%). Proposed SolutionIn my opinion, the most transparent way to enforce the new rules around Validators (albeit slightly technically challenging on C4's side) would be for the judge to be unaware of which submission was original and which was improved. This will eliminate prejudice which, despite how transparent a judge may be, is bound to happen depending on how data is presented to a judge. Additionally, this will allow a "fairer" evaluation of what cut the Validator should acquire, given that C4's toolkit can simply compare whether no change was detected (i.e. they are marked as duplicates), and so on. The above adjustment will also permit underperforming Validators to be detected and lowered in the selection pool. Validator Post-Contest FindingProblem StatementAnother albeit smaller problem with the above is that a validator has an effective extension on the contest's deadline by being able to "submit issues" beyond the deadline by editing a very low-effort ambiguous finding and introducing details to move it from "invalid" to "valid". Proposed SolutionNo dedicated solution needs to exist for this, and guideline changes on the C4 website may need to be introduced for Judges. Specifically, judges who review "improved" findings must assess whether one can reasonably be deduced from the other and if not, to invalidate both the improved and non-improved variant. To note, this approach is compatible with the previous chapter's recommendation by being introduced as a step beyond the first "judge passing" validation. Validator Misaligned IncentiveProblem StatementI understand that there is a level of trust associated with all roles in C4, however, when evaluating whether a feature in the contest's process is appropriate, we must consider the scenario of a malicious Validator however unlikely. In the recently rolled-out feature of presorters invalidating QAs and Gas Reports, low-quality submissions were deliberately not included in this feature to prevent any finding being missed by the Judge of a contest. The newly proposed feature permits a Validator, on an entirely different repository not monitored by the Judge, to invalidate findings before they ever reach the team, the judge, or the presorter. In and of itself, this permits a Validator who has participated in the contest in another capacity to eliminate potentially valid findings. Proposed SolutionEach validator of a particular contest acts independently of others. Afterward, the presort is responsible for picking the "best" revision of a particular finding that the validators created. This will significantly reduce the likelihood of a single malicious Validator having any effect on the final findings that the Judge ultimately reviews. Warden DisputesProblem StatementThe present mechanism has not made it clear how Wardens are able to dispute findings that arise from Validator "enhancements". Additionally, the scenario of a warden participating in a contest without backstage access, acquiring one afterward, and raising concerns within the PJQA window has not been addressed. As a final consideration, a finding that was adjusted by a Validator may eventually be debated by both the validator that enhanced it as well as the finding's original author which, in my opinion, will be a very tricky situation to resolve. Proposed SolutionPJQA concerns should be raisable by Wardens to also accommodate for validator upgrades. Per the earlier concerns raised in this post, wardens should be able to raise a PJQA concern to request a particular finding that was upgraded by a validator from invalid to valid to be re-assessed for reasonable deduction, as an example. Concluding ThoughtsI would like to reiterate that I do not believe the feature is presently optimal for production at this stage, and in my humble opinion requires further iteration by multiple judges before being ubiquitously applied to all upcoming contests. |
I concurs with @alex-ppg and @MiloTruck, this feature is |
Personally, I think the idea of letting validators improve warden submissions is really, really good. It will lead to higher quality submissions overall and reduce the burden on sponsors and judges of validating incomplete submissions. That being said, sponsor input is a key part of the judging lifecycle, and the new process places said burden squarely on the validators' shoulders who have no sponsor input at that point. Validators are now expected to make a binary decision on all findings and either mark them as "valid" and be penalized if they aren't, or "invalid", in which case it is unclear what the penalty is if an issue is successfully escalated in PJQA. If we expect what used to be "sufficient quality submission" findings to now be "valid", then the penalty for missing a valid finding must be higher than that for false positives, as otherwise the incentives become heavily skewed towards rejecting ambiguous findings. If I believe a finding has a 30% chance of being valid, then I am incentivized to mark it as "invalid", whereas it used to be the case that they would be forwarded to the sponsor + judge to make the decision. Not penalizing false negatives more harshly or at all will lead to more PJQA appeals and potentially missed valid findings. I also honestly don't see how the proposed mechanisms of validating in batches of 5 findings and allowing for enhancements are compatible with one another, as the validator will be blind to the complete set of duplicates of a given finding and may choose to enhance it, while a more complete duplicate that negates the need for an enhancement may exist later in the queue. It is also unclear how each dupe set is meant to be assigned to a single validator given the batching mechanism. Moreover, while me and others have found the deduping tool highly useful for presorting, it is not a silver bullet and I believe that if we had some metrics on its accuracy vs final judging results, it would become clear that it fails to correctly label duplicates for a large number of the final unique findings. This change completely strips away the ability (and responsibility) of the lookout to identify duplicates across the full set of findings, and I believe this will negatively impact the validator's work as well as significantly increase the judge's (and likely the sponsor's) workload, who now themselves have to do the real deduping. So to summarize:
|
My recommendation would be to have the judge forego all validator rewards, as is the case as when the judge participates in a contest he's subsequently judging. |
I do not believe that the above bullet point is correct. Presently, a finding does not get a binary judgment of whether it is valid or not; it may be primary, it may be penalized partially, or lack sufficient proof even if it was on the right track. An enhancement of duplicates makes sense, as findings that could have been penalized may ultimately be accepted for a full reward.
I believe this is a very fair point, and important to consider. There are multiple instances of multiple attack vectors being combined to raise the severity of a single submission. In such instances, all single-vector duplicates are partially awarded (due to being of a lower severity) and are grouped under the one that combines all of them as detailed in the C4 docs. A Validator will indeed have full access to all submissions after the contest ends and will be able to combine vulnerabilities that would otherwise not have been during the normal course of the contest. As a Validator rule, I think it might be wise to prevent the combination of existing findings reported in the contest to avoid the specified scenario. I understand the value added by a combination of vulnerabilities in the ultimate Sponsor output, however, accepting such combinations would effectively be unfair and indeed use privileged information. |
Is it wise to give validators access to the usual findings repo when they can use this as inspiration to enhance issues? I feel this would be very detrimental to top wardens, which is contrary to the stated goal of the Spring update. Beginner/Intermediate Finding:
Top Warden Finding:
Validator:
Would the above example not become a common occurrence? The ability to take a potential or low vulnerability and transform it into a valid unique medium/high due to exceptionally specific knowledge is often what separates the top from the middle in my opinion. Perhaps it would be best to lock the usual findings repo until the validator work is finished? |
I see what you mean, but I feel like this is not (or shouldn't be) the purpose of the enhancements. You can improve every single duplicate until it reaches the level of the main finding, but what's the point of that? In that scenario the validator is essentially stealing rewards from all other duplicates, who would have received the partial credit penalty from the other findings. And they are not adding any value since the finding has already been properly demonstrated/articulated elsewhere. This would just incentivize duplication of efforts and render the concept of partial credit duplicates moot. |
I do not believe validators improving warden submissions has material benefit. If we examine the parties involved: Sponsors:The protocol receiving the security review does not benefit from this process unless the invalid/unsatisfactory report is either chosen for the actual report, or a unique finding. Adding another warden name to the header above the listed vulnerability is inconsequential to the project's security and overall report. Wardens:Those with satisfactory/valid reports are significantly and materially negatively impacted by this process. The rewards would effectively get reorganized based due to increased dupes, but it would be rare for a warden to actually receive more reward due to this process. Those with unsatisfactory/invalid reports benefit moderately from this process, as they will earn rewards they would otherwise not be entitled to, but the rewards are split between them and the validator. Judges:There is a moderate impact on judging due specifically to the enhancement part of the change, in that the judges have to determine the validity of the changes etc. However, the validation process should significantly reduce judge workload overall, so the added work from the enhancement judging is somewhat a moot point. Validators:As this is a newly created role and there are no validators yet, there is no delta to consider benefit/detriment. The validator role would benefit from the added authority and financial gain, but removing it does not detriment anybody. This may cause less interest in the validator role due to reduced reward potential. AlternativeWhile personally I do not think the enhancement is of benefit, should it be included I propose an alternate idea: An alternative solution that includes validator enhancement: The validator's have their own pool they compete with each other for based on the number of unsatisfactory -> satisfactory / invalid -> valid reports and severity. Unsatisfactory/invalid reports added to the prize pool are eligible for reduced prize shares. |
Some of my final thoughts on Validators in this current version. Seems this discussion is dead, even thought Validators will be live for all audits in 2 days... In the new system, if you are not a low level warden or a validator, you will receive LESS rewards
I believe projects do not benefit much from the current Validator model:
This will also lead to more total reports:
An example of reward dilution by Validators:
My initial calculations are incorrect due to reward dilution. Actually Validators are incentivised to create more duplicates for issues they have not reported to dilute the total rewards for those issues, which will increase rewards for issues they have reported. |
This is not how reward shares are calculated. While technically the percentages of each party receives for the vulnerability is correct if we ignore the selected for report multiplier, the total pool of funds for that vulnerability decreases with duplicates, so the wardens receive less than what you explain here. Shares are calculated on an exponential distribution. Med Risk Slice: 3 * (0.9 ^ (split - 1)) / split The reward shares are calculated on an exponential curve to discourage sybil attacks, which arguably makes the calculations even worse for wardens. The actual impact on how much of a change this causes is dependent on too many factors to calculate simply. |
I highly think validators shouldn't take a percentage of the main H/M Pot. This will create a lot of issues (already cited in the discussions above). I however also see the value of validators in improving the experience for sponsors (Client is king). I think Validators should compete on their own reward pots and they should get their own point system of how much did they help improve the readability and quality of the reports and they should get rewarded according to how much effort they put in validating: e.g of a point system of validators (that could also aim to improve speed of judging). For each issue u set as valid and send it to the main repo -> u get 1 point. If the judge deem this issue later as unsatisfactory or low quality u loose 2 points. Validators will be incentivised to do their job quickly (to farm the points earlier) and also to validate and improve the quality of the findings submitted. Reguarding, rejecting invalid or LLM generated reports, i don't have a recomendation yet. But I think it is a food for though to consider. Taking shares of the rewards / reducing rewards of other wardens, after the submissions are finished and with access to the high quality submissions, in my opinion will only create dramas and escalation nightmares that wouldn't benefit the C4 brand long-term |
Understood, thanks for the clarification. I saw someone already corrected me in Discord, and they pointed out that actually the opposite to what I pointed out is true. That Validators are incentivised to create more duplicates for issues they have not reported themselves, as this will decrease the rewards for those issues, whilst increasing the rewards for the issue they have reported themselves but are not creating more duplicates for. I.e Validators are incentivised to reduce rewards for all issues they have not reported, reducing rewards for all other wardens apart from themselves. |
Thank you all for the very constructive critique of the Validator role. Based on the input here, I want to offer some answers to questions, and adjustments to the mechanisms as originally outlined. Questions and answersWardens appealing improved submissions
Backstage wardens can certainly appeal the improvements during PJQA. [Edited to clarify that this is a backstage / PJQA process.] Will QA issues be accounted for 50% accuracy?Yes - if a Validator forwards a QA report to the findings repo as satisfactory, and the judge deems it unsatisfactory, it will count against their accuracy rating. Changes to original guidelinesValidators may not improve duplicates of submissions already present in the findings repoSeveral commenters noted that:
Given these critiques, we are updating the Validator guidelines as follows: if the finding is already present in the findings repo, then improved submissions will be judged in their original versions, and improvements disregarded. Validators can only improve HM submissions, not QA/Gas reportsTo clarify the Validator guidelines: improvements are only allowed for High and Medium-risk submissions (HMs), not QA or Gas reports. Updated mechanism for selecting the top 3 QA reportsA few concerns were raised about the "top 3" approach to QA awards:
To address these concerns, we're proposing a slightly modified approach:
Penalties for false negatives
Updated consequences for Validators:
In other words, each false negative is double the negative value of false positives in the ‘incorrect’ count. Clarifications(Some of these questions were asked in Discord.) Judges working as Validators
Judges may participate as Validators, but are not eligible for any HM pool payouts on audits they judge — even if they enhance an issue. QA and Gas validation
Rules for % cuts & enhancements - root cause
Improved submissions must share the same root cause as the original submission. Assignment process
There's an assignment process within the validation repo that will restrict these permissions to the Validator assigned to the submission. Therefore only one Validator would be allowed to enhance a finding. |
An issue that we saw with the bot races, was that some bots submitted gas findings that sounded plausible, but when you tested it, or understood the mechanisms behind it, it was invalid. Without PJQA I fear the same will happen here with the three-vote system, for both gas and qa reports. |
@IllIllI000 wrote:
QA reports in the findings repo will be eligible for PJQA reassessment for satisfactory/unsatisfactory; only the closed reports in the validation report will be excluded. The anti-outcome here is intense PVP behaviour among wardens vying for 1st/2nd/3rd place, and the proposed voting system is designed to avert that. My hope is also that since judges are now responsible for sorting QA reports into satisfactory / unsatisfactory, that the initial assessment will be more thorough. |
This suggests that a validator can only improve an invalid/unsatisfactory finding if the finding is unique. I highly doubt that this will ever be the case. At this point it seems like we've lost the whole point of why the process of improving submissions even was proposed.
|
Edit: NVM, just realized that the vote is after PJQA, so wardens can point out to invalid findings during then. |
I believe that the rule iterations are a welcome change, however, I will maintain that this type of feature should have been in a peer review process for at least a few weeks before being shipped to production. We have observed that rule changes have come into effect as a result of judge feedback in just a few days, and this is a strong indicator that the feature should not have been shipped to production. I would like to provide some follow-up feedback on the revised ruleset:
Wardens already protest partial rewards based on quality, which is inherently a subjective manner (i.e. some Judges mandate PoCs, others do not). Opening a new escalation avenue will simply deter judges from accepting enhancements to avoid the associated hurdle and conflict that would arise from a Warden protesting the enhancement.
While I appreciate the effort in identifying a way forward to minimize the "edge" Validators have, I do not believe this to be the right approach. It is impossible to know what duplicates will occur when a project is handed off to the judge (I can personally specify instances with over 20+ re-duplications). As such, this will end up being a Validator effort that ends up being misallocated. Another problem here is the C4 guideline around privilege escalation. Per C4 guidelines, it is fair game to combine two vulnerabilities to demonstrate a higher severity. What will happen if vulnerability A has no duplicates, vulnerability B has duplicates, and A has been enhanced to combine B for escalation? Similarly, what will happen if A has duplicates, B has duplicates, and only one report is enhanced to escalate both A and B? In such an instance, a Validator will have provided tangible value to the Sponsor so it needs to be accounted for in one way or another.
While the rules around HM vulnerabilities are somewhat uniform, the rules of grading QA / Gas Reports is entirely up to the discretion of each judge. As such, having such subjective evaluations impact the accuracy of a Validator is unfair unless clear QA / Gas Report guidelines are set in place. I understand that the new rule around NC issues and gas optimizations with optimizer turned on are in the right direction, however, there still is no distinction between significant and minuscule gas savings which are awarded unique points by each judge. Additionally, an HM vulnerability being accepted can fluctuate between H and M whereas a QA report would have a very strict impact (L) which would increase the impact of judge subjectivity when assessing these reports in terms of Validator accuracy.
A judge will have an inherent (either subconscious or conscious) bias as to a finding's validity if they have enhanced it, introducing a human factor that will increase error rates in the judging process. As there is no upside for the judge in any case, I propose preventing judges from acting as validators altogether.
I propose the mechanism under which wholly indivisible total reports are distributed to be strictly defined (i.e.
The accuracy penalty is meant to affect a Validator's eligibility to remain one and is a penalty after the fact. As such, an ill-intended Validator can refrain from applying as a Validator until a high-reward contest shows up that they can manipulate / influence. I understand that I am thinking with a malicious mindset, however, C4 has demonstrably suffered manipulation in the past when the stakes are high (i.e. zkSync Era). C4 should be taking steps to minimize the likelihood of a contest being influenced by a single party, and the current Validator approach is insufficient. I propose that a Validator breaching their accuracy threshold renders the contest in review immediately, causing all "tainted" issues to be re-assessed by the other Validators as well as the Judge.
This is an incredibly well-thought-out feature, and I am in full support of this approach. It will result in a reduction of PJQA escalations for reports, and provide better value to the Sponsor and the ultimate report generated by C4. |
Noting here for transparency that I have edited this guideline; my initial interpretation was incorrect, and the "nominate top 3" was an artifact of changing the approach to selecting the top 3 reports, and not intended to affect the validation process for QA reports. |
Thanks so much, @alex-ppg. Couple quick bullets:
On the specific suggestions here:
Seems reasonable. We will come to an answer on this before the first validation session occurs.
I think @CloudEllie and I addressed this somewhere -- our take is judges should be able to validate but not get paid for enhancements. There's a lot of overlapping work done by validators and judges, just like lookouts and judges, and it makes sense to let someone perform both roles, it just doesn't make sense for the judge to be able to enhance.
I think I agree with this, but can you explain what you're thinking in a little more detail? I'm not sure I understand exactly what you're thinking when you say "breaching accuracy threshold"
I think a better/clearer rubric will emerge here. But also, these are going to be subjective, period. With regard to most of the remaining concerns, my view is that the first judge who handles these cases will begin to set precedent, we'll iterate that via case law, and ultimately come to a standard. |
Hey @sockdrawermoney, really appreciate the attention C4 gives to our feedback! I am in alignment with most items stated in your response, and will address some of the things that require further clarification below:
I would like to advise analyzing if there has been any contest whatsoever whereby the judge participated in it as a warden and judged it on top. There is an inherent bias when you have created/improved a finding as you already consider it valid when doing so. I do not believe this is a huge blocker, but I believe that there is no precedent (i.e. same person being a warden and a judge) to expect the same guidelines to work for the validator role. I am happy to be proved wrong, however!
A Validator will have to meet some eligibility criteria when applying/being chosen to be a validator for a particular contest, including accuracy. If those criteria are no longer met as a result of the Validator's work in an active contest, all the findings they have touched should undergo a manual re-evaluation to ensure the Validator's inaccuracy has not compromised the contest's integrity. This re-evaluation should encompass all findings including those discarded at the validator repository. The above will act as a deterrent for "fresh" validators with no history whereby many mistakes will cause their percentage threshold to be "breached" and will effectively prevent a scenario whereby a validator waits until a high-reward contest comes up to apply and compromise it. I am in no way questioning the integrity of existing contest staff (i.e. judges, presorters, validators), and am simply trying to come up with a contingency scenario in case the low likelihood event of a nefarious party participating in the contest process presents itself. I also believe this is a strong indicator of "combat readiness" to external observers who are unaware of C4's inner workings and strict criteria for becoming contest staff. |
@alex-ppg It has indeed happened, but it's been a very long time since it did. I don't have precise examples at my fingertips, but some of our more long-tenured judges such as @GalloDaSballo, and perhaps @0xean, have been around long enough to recall a time when we had fewer judges available and occasionally needed people to do double duty. It's certainly not ideal, but we have always endeavoured to balance pragmatism with idealism.
Let me parse this in my own words and think through how we might make this work:
Does that match what you're proposing? Note that I'm not certain that we can commit to this immediately, but I do want to understand and explore it, as I can clearly see the value. What I can certainly imagine in the immediate term is handling the Validator RSVPs for high-reward audits as special cases, where staff would hand-select a group of Validators from the qualified pool based on accuracy stats. |
Thanks for contributing further @CloudEllie, once again appreciate the focus given on the matter! I hope more judges contribute to this thread as well and refute my statements if need be.
I see, in that case, I retract my comment and the present guidelines are adequate and would require change only if an incident occurs.
It does so precisely, and handling high-reward audit Validator RSVPs manually is also a good countermeasure. I believe I have gotten my point across as I wanted to, and the responsibility of an iteration of my feedback now lies in C4s hands. I do not have any further concerns to raise at this moment and will revisit this thread if need be after the feature goes live and I have personally experienced it as either a Validator or a Judge. |
Hugely appreciate you sharpening our thinking here, @alex-ppg 🙏 |
@alex-ppg you're a gem, as always. thank you! |
I think if validator decides to enhance a report, in validation repo, there should be a label labelling report as enhanced. It would also be good to link the enhanced report to original report for judge and other security researcher. otherwise, the auditor may have to going through each judged submission and go back and forth between the validation repo and the finding repo to check if their report is enhanced and how much enhancement are made. |
@JeffCX All enhanced submissions will have the |
Citing my conversation with @CloudEllie from the SR channel here. IssueThe changes to the original guidelines in this org issue above states that "Validators may not improve duplicates of submissions already present in the findings repo". The term This is a problem since if the validation repo is not considered, we could have scenarios where proof from issue A in the validation repo could be used to improve issue B in the validation repo (assuming both issues share the same root cause). SolutionValidators should be disallowed from improving submissions with duplicates in the validation repo (unless all issues in the group are missing something important and can be collectively improved). |
Since there will be plenty of questions/discussion, I am frontrunning these by opening an issue here to have all the comms in one place.
https://code4rena.com/blog/code4rena-spring-update-2024
The text was updated successfully, but these errors were encountered: