Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow selectors on *_NAMES collections #1143

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

blotus
Copy link
Contributor

@blotus blotus commented Sep 4, 2024

Hello,

This PR aims to allow the use of rules such as SecRule &REQUEST_COOKIES_NAMES:JSESSIONID "@eq 0" "id:45" (supported by ModSecurity and also present as an example in the documentation of Coraza), which currently causes Coraza to crash due to an explicit panic call.

There are 3 main changes:

  • Check if a collection supports selectors during parsing time, instead of throwing an error at runtime.
  • Make collections.NamedCollectionNames implements collection.Keyed: this allows the use of a selector for the collections created with .Names()
  • Remove runtime panics calls: as Coraza is designed to be embedded in other software, calling panic is never a good idea.

Parser changes

I've embedded information about whether a collection can be selected or not in the internal/variables/variables.go file, as a comment for each collection that does support it (hopefully, I did not miss any), and added a CanBeSelected method that is called during parsing to check if the selector is allowed or not.

I don't know if I'm really happy with embedding information in comments, but it was the least intrusive way I found to handle this.

collections.NamedCollectionNames implements collection.Keyed

This one is straightforward, NamedCollectionNames now implements Get, FindString and FindRegex.
Because it's a named collection, the key and the value in the returned results will be the same: the name of the key.

Remove runtime panic

The first two were removed as part of making namedCollectionNames implements Keyed.

The other two (which are the ones that caused the crash mentioned at the beginning of this PR) have been replaced by an error log.
In theory, this log should never occur because selectability is now checked during parsing (in practice, it could happen if a collection is marked as selectable but does not implement Keyed).

@blotus blotus requested a review from a team as a code owner September 4, 2024 12:32
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 59.37500% with 39 lines in your changes missing coverage. Please review.

Project coverage is 81.57%. Comparing base (edad234) to head (41ca211).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
internal/variables/variablesmap.gen.go 50.00% 26 Missing ⚠️
internal/corazawaf/transaction.go 8.33% 11 Missing ⚠️
internal/collections/named.go 93.10% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1143      +/-   ##
==========================================
- Coverage   81.69%   81.57%   -0.13%     
==========================================
  Files         169      169              
  Lines        9770     9855      +85     
==========================================
+ Hits         7982     8039      +57     
- Misses       1537     1565      +28     
  Partials      251      251              
Flag Coverage Δ
coraza.rule.case_sensitive_args_keys 81.53% <59.37%> (-0.13%) ⬇️
coraza.rule.multiphase_valuation 81.57% <59.37%> (-0.13%) ⬇️
coraza.rule.no_regex_multiline 81.51% <59.37%> (-0.13%) ⬇️
default 81.57% <59.37%> (-0.13%) ⬇️
examples+ 16.33% <2.08%> (-0.22%) ⬇️
examples+coraza.rule.case_sensitive_args_keys 81.53% <59.37%> (-0.13%) ⬇️
examples+coraza.rule.multiphase_valuation 81.40% <59.37%> (-0.13%) ⬇️
examples+coraza.rule.no_regex_multiline 81.43% <59.37%> (-0.13%) ⬇️
examples+memoize_builders 81.53% <59.37%> (-0.13%) ⬇️
examples+no_fs_access 80.86% <59.37%> (-0.13%) ⬇️
ftw 81.57% <59.37%> (-0.13%) ⬇️
memoize_builders 81.66% <59.37%> (-0.13%) ⬇️
no_fs_access 81.02% <59.37%> (-0.13%) ⬇️
tinygo 81.54% <59.37%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jptosso
Copy link
Member

jptosso commented Sep 4, 2024

Interesting, thank you very much for your contribution

Im a bit worried about how the complexity of variables is growing. Maybe not for this PR, but we need to improve generation of code, even for this "selectable" feature

Comment on lines +215 to +217
// CanBeSelected returns true if the variable supports selection (ie, `:foobar`)
func (v RuleVariable) CanBeSelected() bool {
switch v {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding everything here makes sense. Just return true on those who can, and use the default otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although for performance it makes sense, I believe this is easier to maintain and more readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is auto generated so I would not be concern about readability. @blotus could you do a quick benchmark on this matter i.e. adding everything or just true and all the rest on a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that readability does not really matter here.
I've updated the code generation to only generate the cases where true is returned.

@fzipi
Copy link
Member

fzipi commented Sep 8, 2024

Im a bit worried about how the complexity of variables is growing. Maybe not for this PR, but we need to improve generation of code, even for this "selectable" feature

Definitely not for this PR. We should create an issue to refactor generation then.

@fzipi fzipi changed the title Allow selectors on *_NAMES collections feat: allow selectors on *_NAMES collections Sep 8, 2024
@jptosso
Copy link
Member

jptosso commented Sep 18, 2024

LGTM in general, but I believe this lacks negative tests and its decreasing the general project coverage

@@ -101,11 +101,41 @@ type NamedCollectionNames struct {
}

func (c *NamedCollectionNames) FindRegex(key *regexp.Regexp) []types.MatchData {
panic("selection operator not supported")
var res []types.MatchData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance data is empty? if so I would handle the empty case before this allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are probably situations where data can be empty (I haven't tested, but I'd expect a collection like XML to have an empty data on a non-XML request )

AFAIK, declaring a slice like this does not perform any actual allocation (other than the header of the slice, which will be all zero), and the actual allocation will be performed the first time we append to it, but I can add a check on data if you are worried about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, from what I see, this check is not performed in the existing code (here for example)

for k, data := range c.collection.Map.data {
if key.MatchString(k) {
for _, d := range data {
res = append(res, &corazarules.MatchData{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if MatchData is mutable, if so we probably want to reuse the pointer?

@@ -574,13 +574,15 @@ func (tx *Transaction) GetField(rv ruleVariableParams) []types.MatchData {
if m, ok := col.(collection.Keyed); ok {
matches = m.FindRegex(rv.KeyRx)
} else {
panic("attempted to use regex with non-selectable collection: " + rv.Variable.Name())
// This should probably never happen, selectability is checked at parsing time
tx.debugLogger.Error().Str("collection", rv.Variable.Name()).Msg("attempted to use regex with non-selectable collection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed time ago that panic is ok, as this is a low level issue and coraza should not run here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with this: coraza is designed as a library, and in my point of view, this means that explicit panics must be avoided at all costs (with very little exceptions, if you can call panic, you can return an error), and not doing anything is almost always better than bringing down a production website.

If a function call can lead to a panic, it should be made very clear to the caller (either with an explicit function name (Must....) or, at the very least, with some documentation): I don't mind wrapping every call to coraza with a recover, but I need to be aware it's required.

For this specific case, it can only (AFAIK) be triggered by a configuration error, so this means it should be detected when parsing the configuration (and is now thanks to this PR), so the panic has become redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants