-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regex pattern that parses extractedValues #7
base: main
Are you sure you want to change the base?
Fix regex pattern that parses extractedValues #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sseifert-akamai, thanks a lot again for this contribution. While reviewing the change I just realized it is also implementing some auto-formatting from your IDE probably. This is noisy and prevents accurate identification of the fix, could you please reduce the change to just what is relevant?
let pat = /^name=([^;]*); value=([^;]*).*$/; | ||
return getResponseHeaderValues(response, "x-akamai-session-info") | ||
.reduce(function (vars, value) { | ||
let pat = /^name=(.*); value=([^\s]*)(;.*)?$/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex in this line is assuming that there is at most 1 ";" followed by something that needs to be ignored after the value but
- there can be 2 when variable value is extracted from qsp or from cookie. In this case variable value is not only followed by full_location_id but also by separator. In that case, the part that includes full_location_id is included in the value against which assert function is matching
- there can be 0 when value is not extracted. In this case, if variable value includes some ";", the part of this value after the last ";" matches unepectedly with the last group.
Looking closer at groups in this regex:
- The 1st one
(.*)
is capturing but it may not need to. More importantly, it can match with any caracter including those that don't belong to a variable name. It could be more specific. - The 2nd one
([^\s]*)
is capturing and this is probably needed. More importantly, it excludes spaces but it shouldn't. - The 3rd one
(;.*)?
is capturing but it may not need to. More importantly, it could be more specific.
With this regex proposed instead ^name=([^\s]*); value=(.*)(; full_location_id=[^;]*(; separator=[^;]*)?)?$
:
- it is leaving asside capturing or not
- name and value are captured in same group 1 and 2 as before
- group 3 that captured one of full_location_id or separator before is now capturing both if present
- capture group 4 is added and it captures separator only
However, it requires the match to be ungreedy.
Attached screenshots show test results with U flag with PCRE2 for ungridiness, with sample values below:
- name=NOT_EXTRACTED; value= key1=value1; key2=value2
- name=NOT_EXTRACTED; value= key3=value3
3. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key1=value1; key2=value2; full_location_id=cookieName3; separator=%3d
4. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key3=value3; full_location_id=cookieName3; separator=%3d
5. name=EXTRACTED_FROM_HEADER; value=key1=value1; key2=value2; full_location_id=cookieName3
6. name=EXTRACTED_FROM_HEADER; value=key3=value3; full_location_id=cookieName3
These sample values are meant to cover:
- variable value contains ";" or not
- variable is not extracted
- variable is extracted from cookie (or qsp) or is extracted from header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If greediness can't be disabled, another regex similar to the one below may be used.
It it's built with an alternative where 1st option captures full_location (and eventually separator) when it's present. 2nd option should match only when full_location and separator are absent.
^name=([^\s]*); value=(.*)(; full_location_id=.*)$|^name=([^\s]*); value=(.*)$
A problem with this approach is that name and value are captured in group 1 and 2 when full_location_id is present
and in groups 4 and 5 otherwise. So this needs to be dealt with if captured groups are actually used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to add also tests that cover the different kind of values that may be found in the header:
for example
|
This change fixes an issue with extractedValues when the value contains semicolons.