Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud spring cleaning 2024 04 30 #97

Merged
merged 21 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 50 additions & 35 deletions awarding/incentive-model-and-awards/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,14 @@ The resulting awards are:
| 'Warden B' | 'H-02' | '3' | 8.91 | 3 | 2.70 | 1000 |
| 'Warden C' | 'H-02' | '3' | 8.91 | 3 | 2.70 | 1000 |

### Bonuses for top competitors
For each audit starting on or after April 30, 2024, there are two bonuses for top-performing wardens:

1. **Hunter bonus:** 10% of the HM pool will be awarded to the warden or team who identities the greatest number of unique HMs.
2. **Gatherer bonus:** 10% of the HM pool will be awarded to the warden or team who identifies the greatest number of valid HMs.

Both bonuses weigh Highs more heavily than Mediums, similarly to Code4rena's standard awarding mechanism.

### Duplicates getting partial credit

All issues which identify the same functional vulnerability will be considered duplicates regardless of effective rationalization of severity or exploit path.
Expand Down Expand Up @@ -103,66 +111,48 @@ We can see here that the logic behind the `partial-` labels only impacts the awa

Only the award amounts for "partial" findings have been reduced, in line with expectations. The aim of this adjustment is to recalibrate the rewards allocated for these specific findings. Meanwhile, the awards for full-credit findings remain unchanged.

## Bot races

The first hour of each Code4rena audit is devoted to a bot race, to incentivize high quality automated findings as the first wave of the audit.

- The winning bot report is selected and shared with all wardens within 24 hours of the audit start time.
- The full set of issues identified by the best automated tools are considered out of scope for the audit and ineligible for awards.

Doing this eliminates the enormous overlapping effort of all wardens needing to document common low-hanging issues And because the best bot report is shared with auditors at the start of the audit, these findings serve as a thorough starting place for understanding the codebase and where weaknesses may exist.

**Ultimately, the bot race ensures human auditors are focused on things humans can do.**

By designating a portion of the pool in this direction, Code4rena creates a separate lane for the significant investment of effort that many auditors already make in automated tooling -- and rather than awarding 100 people for identifying the same issue, we award the best automated tools.

## Analyses
### Validator-improved submissions

Each warden is encouraged to submit an Analysis alongside their findings for each audit, to share high-level advice and insights from their review of the code.

Where individual findings are the "trees" in an audit, the Analysis is a "forest"-level view.

Advanced-level Analyses compete for a portion of each audit's award pool, and are graded and awarded similarly to QA and Gas Optimization reports.
[Validators](../../roles/certified-contributors/validators.md) may enhance submissions (add PoC, increase quality of report, etc.) in exchange for a % of the finding’s payout.

For Validator-improved submissions: if the judge believes the validator added a measurable enhancement, they get a split of the value of the issue:
- 25% cut → small enhancement = moved submission from unsatisfactory to satisfactory
- 50% cut → med enhancement = moved submission from invalid to valid
- 75% cut → large enhancement = identified a more severe vulnerability

## QA and Gas Optimization Reports

In order to incentivize wardens to focus efforts on high and medium severity findings while also ensuring quality coverage, the pool’s allocation is capped for low severity, non-critical, and gas optimization findings.
In order to incentivize wardens to focus efforts on high and medium severity findings while also ensuring quality coverage, the pool’s allocation is capped for low severity, governance, and gas optimization findings.

Low and non-critical findings are submitted as a **single** QA report. Similarly, gas optimizations are submitted as a single gas report. For more on reports, see [Judging criteria](/awarding/judging-criteria/README.md).
Low and governance findings are submitted as a **single** QA report. Similarly, gas optimizations are submitted as a single gas report. For more on reports, see [Judging criteria](/awarding/judging-criteria/README.md).

QA and gas optimization reports are awarded on a curve based on the judge’s score.

- QA reports compete for a share of 2.5% of the prize pool (e.g. $1,250 for a $50,000 audit);
- The gas optimization pool varies from audit to audit, but is typically 2.5% of the total prize pool (e.g. $1,250 for a $50,000 audit);
- QA and Gas optimization reports are scored by judges using A/B/C grades (with C = unsatisfactory), and awarded on a curve.
- QA reports compete for a share of 4% of the prize pool (e.g. $2,000 for a $50,000 audit);
- The gas optimization pool varies from audit to audit;
- QA and Gas optimization reports are awarded on a curve.

There is a very high burden of quality and value provided for QA and gas optimization reports. Only submissions that demonstrate full effort worthy of consideration for inclusion in the report will be eligible for rewards.

It is highly recommended to clearly spell out the impact of proposed gas optimizations.

Historically, Code4rena valued non-critical findings at 0; the intent of the QA report is not to increase the value of non-criticals, but rather to allow them to be consolidated in reports alongside low severity issues.

**Note:** Audits pre-dating February 3, 2022 awarded low risk and gas optimization shares as: `Low Risk Shares: 1 * (0.9 ^ (findingCount - 1)) / findingCount`

In the unlikely event that zero high- or medium-risk vulnerabilities are found, the HM award pool will be divided based on the QA Report curve.

## Grades for Analyses, QA and Gas reports

Analyses, QA reports and Gas reports are graded A, B, or C.
### Ranks for QA and Gas reports

C scores are unsatisfactory and ineligible for awards.
_These guidelines apply to all audits starting on or after April 30, 2024._

All A-grade reports receive a score of 2; All B-grade reports get a 1. Awarding for QA and Gas reports is on a curve that's described [here](https://docs.code4rena.com/awarding/incentive-model-and-awards/curve-logic).
After judging is complete, the Judge and Validators vote to select the top 3 QA reports and Gas reports. (In the case of a tie vote, there may be a 4th place report.)

### Bonus for best / selected for report
Judges choose the best report in each category (Analysis, QA report, and Gas report), each of which earns the same 30% share bonus described under "High and Medium Risk bugs."
The 1st, 2nd, and 3rd place winners are awarded using a curve model that will be documented here ASAP.

**Note:** if the `selected for report` submission has a B-grade label, it will still be treated as A-grade and given proportionally more than B-grade, plus the 30% bonus for being `selected for report`.
Satisfactory reports not among the winning reports will not be awarded -- but will count towards wardens' accuracy scores.

## Satisfactory / unsatisfactory submissions

Any submissions deemed unsatisfactory are ineligible for awards.
Any submissions deemed unsatisfactory are ineligible for awards, and count against wardens' accuracy scores.

The bar for satisfactory submissions is that they are roughly at a level that could be found in a draft report by a professional auditor: specifically on the merits of technical substance, with writing quality considered only where it interferes with comprehension of the technical message.

Expand All @@ -176,3 +166,28 @@ It is possible for a submission to be *technically* valid and still unsatisfacto
- approach is disrespectful of sponsors’ and judges’ time in some way

Any submissions that appear to be direct copies of other reports in the current audit will be collectively deemed unsatisfactory.

## Other submission types

As of April 30, 2024, the following submission types are paused:

### Bot reports

The first hour of each Code4rena audit is devoted to a bot race, to incentivize high quality automated findings as the first wave of the audit.

- The winning bot report is selected and shared with all wardens within 24 hours of the audit start time.
- The full set of issues identified by the best automated tools are considered out of scope for the audit and ineligible for awards.

Doing this eliminates the enormous overlapping effort of all wardens needing to document common low-hanging issues And because the best bot report is shared with auditors at the start of the audit, these findings serve as a thorough starting place for understanding the codebase and where weaknesses may exist.

**Ultimately, the bot race ensures human auditors are focused on things humans can do.**

By designating a portion of the pool in this direction, Code4rena creates a separate lane for the significant investment of effort that many auditors already make in automated tooling -- and rather than awarding 100 people for identifying the same issue, we award the best automated tools.

### Analyses

Analyses share high-level advice and insights from wardens' review of the code.

Where individual findings are the "trees" in an audit, the Analysis is a "forest"-level view.

Analyses compete for a portion of each audit's award pool, and are graded and awarded similarly to QA and Gas Optimization reports.
2 changes: 1 addition & 1 deletion awarding/incentive-model-and-awards/awarding-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Judging data is used to generate the awards using Code4rena's award calculation
- Risk level
- Validity
- Number of duplicates
- Grade (A, B, C; Satisfactory/Unsatisfactory)
- Rank (1st, 2nd, 3rd; Satisfactory/Unsatisfactory)
- In some cases, "partial duplicate" status

It should be possible to reverse engineer awards using a combination of two CSV files:
Expand Down
18 changes: 7 additions & 11 deletions awarding/incentive-model-and-awards/qa-gas-report-faq.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,28 @@
# FAQ about QA and Gas Reports

This FAQ pertains to the award mechanism update that takes effect February 3, 2022, which changes the submission guidelines for low-risk, non-critical, and gas optimization reports. For more details, see [Judging Criteria](https://docs.code4rena.com/roles/wardens/judging-criteria).
This FAQ pertains to the award mechanism update that takes effect April 30, 2024, which changes the submission guidelines for low-risk, non-critical, and gas optimization reports. For more details, see [Judging Criteria](https://docs.code4rena.com/roles/wardens/judging-criteria).

### What happens to the award pool if no Med/High vulns are found?

Unless otherwise stipulated in the audit repo, the full pool would then be divided based on the QA Report curve.

### Will non-critical findings hold some weight? Just want to know if it's worth spending a considerable amount of time writing this part of the report.
### Can I still include non-critical findings in my QA report?

The full QA report will be graded on a curve against the other reports. We'll be experimenting together as a community with this, but we think we'll learn a lot and it will be interesting to see the best practices emerge.
Non-critical findings are discouraged for QA reports.

We are intentionally not providing an "example," as we are eager to see what approaches folks take and to be able to learn from a variety of approaches.

### What if a low-impact QA report turns out to be a high-impact report? How does that work with the 10% prize pool? Would the report be upgraded?
### What if a low-impact QA report turns out to be a high-impact report? Would the report be upgraded?

It's conceivable it could be upgraded, though it's important to consider that part of auditing is demonstrating proper theory of how an issue could be exploited. If a warden notices something is "off" but is unable to articulate why it could lead to loss of funds, for example, the job is only half-done; without understanding the implications, a developer could very well overlook or deprioritize the issue.

The tl;dr for determining severity is relatively clear with regard to separating by impact.
The tl;dr for [determining severity](../../awarding/judging-criteria/severity-categorization.md) is relatively clear with regard to separating by impact.

### What happens when an issue submitted by the warden as part of their QA report (an L or N) *DOES* get bumped up to Med/High by the judge after review?
### What happens when an issue submitted by the warden as part of their QA report (an L or C) *DOES* get bumped up to Med/High by the judge after review?

If it seemed appropriate to do so based on a judge's assessment of the issue, they could certainly choose to do this.

The judge could create a new separate Github issue in the findings repo that contains the relevant portions of the warden's QA report, and add that to the respective H or M level bucket.

However, QA items may be marked as a duplicate of another finding *without* being granted an upgrade, since making the case for *how* an issue can be exploited, and providing a thorough description and proof of concept, is part of what merits a finding properly earning medium or high severity.

### Conversely, in the reverse situation where an issue submitted by wardens as H/M level, is subsequently downgraded to QA level by the judge during their review, would the penalty just be excluding the overrated warden submission from consideration in regards to the QA rewards?

We'll need to see how it works in reality, but our current assumption is that (a) low severity findings attempted to get pushed into med/high would essentially get zero (just logically so since they wouldn't be high or med), and then (b) their QA report would be lower quality as a result, and so they wouldn't score as highly as they could have. Judges could also decide to mark off points in someone's QA report if they saw behavior that seemed like it might be trying to game for higher rewards by inflating severity, so it could have a negative consequence as well.
In theory, findings downgraded to QA are grouped with the warden's QA report (if one exists) and they are grouped together. In practice, however, we have found that downgraded issues do not have a significant impact on wardens' overall QA score. Judges can also decide to mark off points in someone's QA report if they see behavior that seems like it might be trying to game for higher rewards by inflating severity, so it can have a negative consequence as well.

54 changes: 27 additions & 27 deletions awarding/judging-criteria/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,35 @@ The scoring system has three primary goals:
* Hardening C4 code audits to Sybil attacks
* Encouraging coordination by incentivizing Wardens to form teams.

### QA reports (low risk and governance)

Low risk and Governance findings must be submitted as a _single_ QA report per warden. We allocate a **fixed 4% of prize pools toward QA reports.**

QA reports should include:

* all low severity findings; and
* all Governance findings.

Each QA report should be assessed based on report quality and thoroughness as compared with other reports, with awards distributed on a curve.

Judges have discretion to assign a lower grade to wardens overstating the severity of QA issues (submitting low/non-critical issues as med/high in order to angle for higher payouts). Judges may also raise the severity of a QA finding at their discretion.

### Gas reports

Gas reports should be submitted using the **same approach as the QA reports:** a single submission per warden which includes all identified optimizations.

Gas pools are optional, but for audits that include Gas optimizations, the precise award pool can be found in that audit's repo.

## Estimating Risk

See [Severity Categorization](https://docs.code4rena.com/awarding/judging-criteria/severity-categorization).

## Other report types

### Analysis

_This report type is currently paused, and is not accepted for audits starting on or after April 30, 2024._

Analyses are judged A, B, or C, with C being unsatisfactory and ineligible for awards. The judge selects the best Analysis for inclusion in the audit report.

An analysis is a written submission outlining:
Expand Down Expand Up @@ -99,30 +126,3 @@ Areas of interest include:
- Weakspots and any single points of failure

Merely repeating the code functionality in pseudo-documentation is not considered valuable information.

### QA reports (low/non-critical)

QA reports are graded A, B, or C, with C being unsatisfactory and ineligible for awards. The judge selects the best QA report for inclusion in the audit report.

Low and non-critical findings must be submitted as a _single_ QA report per warden. We allocate a **fixed 2.5% of prize pools toward QA reports.**

QA reports should include:

* all low severity findings; and
* all non-critical findings.

Each QA report should be assessed based on report quality and thoroughness as compared with other reports, with awards distributed on a curve.

Judges have discretion to assign a lower grade to wardens overstating the severity of QA issues (submitting low/non-critical issues as med/high in order to angle for higher payouts). Judges may also raise the severity of a QA finding at their discretion.

### Gas reports

Gas reports are graded A, B, or C, with C being unsatisfactory and ineligible for awards. The judge selects the best Gas report for inclusion in the audit report.

Gas reports should be submitted using the **same approach as the QA reports:** a single submission per warden which includes all identified optimizations. The gas pool is allocated on a curve.

The gas pool varies from audit to audit, but typically it consists of 2.5% of the total prize pool. The precise gas pool for each audit can be found in that audit's repo.

## Estimating Risk

See [Severity Categorization](https://docs.code4rena.com/awarding/judging-criteria/severity-categorization).
4 changes: 2 additions & 2 deletions awarding/judging-criteria/severity-categorization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

Where **assets** refer to funds, NFTs, data, authorization, and any information intended to be private or confidential:

* **QA (Quality Assurance)** Includes both **Non-critical** (code style, clarity, syntax, versioning, off-chain monitoring (events, etc) and **Low risk** (e.g. assets are not at risk: state handling, function incorrect as to spec, issues with comments). Excludes Gas optimizations, which are submitted and judged separately.
* **QA (Quality Assurance)** Includes **Low risk** (e.g. assets are not at risk: state handling, function incorrect as to spec, issues with comments) and **Governance** (centralization risks, admin privileges). Excludes Gas optimizations, which are submitted and judged separately. Non-critical issues (code style, clarity, syntax, versioning, off-chain monitoring (events, etc) are discouraged.
* **2 — Med:** Assets not at direct risk, but the function of the protocol or its availability could be impacted, or leak value with a hypothetical attack path with stated assumptions, but external requirements.
* **3 — High:** Assets can be stolen/lost/compromised directly (or indirectly if there is a valid attack path that does not have hand-wavy hypotheticals).

## Centralization risks

Submissions describing centralization risks should be submitted as follows:

- Direct misuse of privileges shall be submitted in the Analysis report.
- Direct misuse of privileges shall be submitted in the QA report.
- Reckless admin mistakes are invalid. Assume calls are previewed.
- Mistakes in code only unblocked through admin mistakes should be submitted within a QA Report.
- Privilege escalation issues are judged by likelihood and impact and their severity is uncapped.
Expand Down
Loading