Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex pattern that parses extractedValues #7

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/js/akamai.js
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ akamai = {
* as an object.
*/
extractedValues(response) {
let pat = /^name=([^;]*); value=([^;]*).*$/;
let pat = /^name=(.*); value=([^\s]*)(;.*)?$/;
Copy link
Contributor

@bacalec bacalec Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex in this line is assuming that there is at most 1 ";" followed by something that needs to be ignored after the value but

  • there can be 2 when variable value is extracted from qsp or from cookie. In this case variable value is not only followed by full_location_id but also by separator. In that case, the part that includes full_location_id is included in the value against which assert function is matching
  • there can be 0 when value is not extracted. In this case, if variable value includes some ";", the part of this value after the last ";" matches unepectedly with the last group.

Looking closer at groups in this regex:

  1. The 1st one (.*) is capturing but it may not need to. More importantly, it can match with any caracter including those that don't belong to a variable name. It could be more specific.
  2. The 2nd one ([^\s]*) is capturing and this is probably needed. More importantly, it excludes spaces but it shouldn't.
  3. The 3rd one (;.*)? is capturing but it may not need to. More importantly, it could be more specific.

With this regex proposed instead ^name=([^\s]*); value=(.*)(; full_location_id=[^;]*(; separator=[^;]*)?)?$:

  • it is leaving asside capturing or not
  • name and value are captured in same group 1 and 2 as before
  • group 3 that captured one of full_location_id or separator before is now capturing both if present
  • capture group 4 is added and it captures separator only

However, it requires the match to be ungreedy.

Attached screenshots show test results with U flag with PCRE2 for ungridiness, with sample values below:
proposedRegex1_ungreedy_NotExtracted

  1. name=NOT_EXTRACTED; value= key1=value1; key2=value2
  2. name=NOT_EXTRACTED; value= key3=value3

proposedRegex1_ungreedy_ExtractedFromCookieOrQsp
3. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key1=value1; key2=value2; full_location_id=cookieName3; separator=%3d
4. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key3=value3; full_location_id=cookieName3; separator=%3d

proposedRegex1_ungreedy_ExtractedFromHeader
5. name=EXTRACTED_FROM_HEADER; value=key1=value1; key2=value2; full_location_id=cookieName3
6. name=EXTRACTED_FROM_HEADER; value=key3=value3; full_location_id=cookieName3

These sample values are meant to cover:

  • variable value contains ";" or not
  • variable is not extracted
  • variable is extracted from cookie (or qsp) or is extracted from header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a greedy match, the 1st new proposed regex capture full_location_id and separator in the group that captures value, which is not wanted.
proposedRegex1_greedy_ExtractedFromCookieOrQsp

Copy link
Contributor

@bacalec bacalec Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If greediness can't be disabled, another regex similar to the one below may be used.

It it's built with an alternative where 1st option captures full_location (and eventually separator) when it's present. 2nd option should match only when full_location and separator are absent.
^name=([^\s]*); value=(.*)(; full_location_id=.*)$|^name=([^\s]*); value=(.*)$

A problem with this approach is that name and value are captured in group 1 and 2 when full_location_id is present
and in groups 4 and 5 otherwise. So this needs to be dealt with if captured groups are actually used.

proposedRegex2_greedy_NotExtracted

proposedRegex2_greedy_ExtractedFromHeader

proposedRegex2_greedy_ExtractedFromCookieOrQsp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the group that captures the value of the variable, the greedy quantifier * can be replaced by the lazy quantifier *?:
^name=([^\s]*); value=(.*?)(; full_location_id=[^;]*(; separator=[^;]*)?)?$
proposedRegex3_lazyQuantifier_ExtractedFromCookieOrQsp

return getResponseHeaderValues(response, "x-akamai-session-info")
.reduce(function (vars, value) {
if (pat.test(value)) {
Expand Down