-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix '$\/]' in HTML::Entities::encode_entities #45
Conversation
I think it would be worth keeping parity with URI::Escape, which allows character classes like I fixed some similar issues in URI::Escape in libwww-perl/URI#112. It also eliminated the unneeded subref and string eval. I think I'd prefer to also fix that here as long as we're making changes. But the tests in this PR demonstrate a mistake in that PR. It doesn't properly account for all double |
encode_entities() generates and evals custom code at runtime (as a performance optimization). However, its code generation was too naive and certain characters could be used to break out of the character class regex: - `$` (dollar sign) would trigger perl's variable interpolation - `]` and `/` (the character class and regex delimiters, respectively) The latter two were usually escaped, but not if they were preceded by `\` in the input, even if that `\` was itself escaped by another `\` (NB: this is why it is generally a mistake to handle escaping logic with look-behinds: you need arbitrary-width look-behind to figure out whether the current chain of backslashes is even or odd in length in order to know what is being escaped or not). The latter issue was fixed by doing a single pass over the input string with no look-behind; the former by switching the delimiter to `'` (which inhibits interpolation). Fixes libwww-perl#44.
396f051
to
9e2ac53
Compare
And add a few tests for them.
9e2ac53
to
0b54570
Compare
I've implemented support for POSIX-style character classes now, along with additional tests. (The URI::Escape code doesn't seem to handle negated classes like |
This simplifies things a bit. Instead of coderefs, our cache now contains only compiled regex objects. The actual s/// op (there is only one now) lives outside, in the regular code of encode_entities.
It would be good to also test escaping character classes like |
@haarg this just needs additional tests? |
I added some tests after merging. |
Version 3.83 just released. |
encode_entities() generates and evals custom code at runtime (as a performance optimization). However, its code generation was too naive and certain characters could be used to break out of the character class regex:
$
(dollar sign) would trigger perl's variable interpolation]
and/
(the character class and regex delimiters, respectively)The latter two were usually escaped, but not if they were preceded by
\
in the input, even if that\
was itself escaped by another\
(NB: this is why it is generally a mistake to handle escaping logic with look-behinds: you need arbitrary-width look-behind to figure out whether the current chain of backslashes is even or odd in length in order to know what is being escaped or not).The latter issue was fixed by doing a single pass over the input string with no look-behind; the former by switching the delimiter to
'
(which inhibits interpolation).Fixes #44.