You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Single carriage return line endings (\r) are treated as non-breaking characters. This is contrary to the YAML spec, which explicitly calls out CR, LF and CRLF as the (only) possible line breaks.
The extra tokens in the second case are a result of \r being treated like any other (non-breaking) character, meaning the tokens are split only at the first \n and not reassembled:
(..., a, \n, \x1F, \r\r\r b)
Versions (please complete the following information):
Environment: [e.g. Node.js 14.7.0 or Chrome 87.0]
Node.js 20.17.0
yaml: [e.g. 1.10.0 or 2.0.0-2]
2.6.1
Additional context
There are several places this seems to manifest, for example:
While recognising that this is a divergence from the spec, does it affect any real-world use for you? As in, do you have some content that is actually using a lone \r as a line break?
No real-world impact at the moment, beyond the time spent debugging it.
For full context: I'm working on a component that extracts YAML fields from a larger document for passing to a third-party tool, and needs to be able to correlate line and column number from that extracted content with the original document.
I'd hoped I could reuse much of the existing logic, but given the need to know about the effects on column position of expanding, for instance, escape codes, for now I'm effectively replacing the whole "compose" layer. In the rewritten version, I keep track of positions, so I can emit a list of coordinates with the composed text that lets me align it to the original source doc.
My set of unit tests/edge cases for this include a mixture of line endings, and those are failing, which is why I had to track the problem back through the layers (right to the lexer).
If there's no intent to fix this, for now I just need to ensure all my test cases use only \n or \r\n, and that I document that as a restriction. I think mixing those two in a file would also be supported, as most/all of the problems I've encountered so far seem to revolve around \r.
Describe the bug
Single carriage return line endings (
\r
) are treated as non-breaking characters. This is contrary to the YAML spec, which explicitly calls out CR, LF and CRLF as the (only) possible line breaks.A similar problem afflicts other parsers, for example jbeder/yaml-cpp#986
To Reproduce
Note the output:
Expected behaviour
The extra tokens in the second case are a result of
\r
being treated like any other (non-breaking) character, meaning the tokens are split only at the first\n
and not reassembled:(...,
a
,\n
,\x1F
,\r\r\r b
)Versions (please complete the following information):
Environment: [e.g. Node.js 14.7.0 or Chrome 87.0]
Node.js 20.17.0
yaml
: [e.g. 1.10.0 or 2.0.0-2]2.6.1
Additional context
There are several places this seems to manifest, for example:
https://github.com/eemeli/yaml/blob/main/src/parse/lexer.ts#L223
https://github.com/eemeli/yaml/blob/main/src/parse/lexer.ts#L191
The text was updated successfully, but these errors were encountered: