-
Hi, thanks for the amazing library! I’m building a SQL-like lexer and aiming to remain zero-copy while also handling escape sequences like I’ve noticed that in many Chumsky examples (such as JSON parsing), the parser uses My challenge: Questions:
I suspect that if we truly need to modify bytes (e.g., collapse I currently have something like this(also have the #[derive(Debug, Clone)]
struct Stringg(String);
pub trait YeParser<'a, T>: Parser<'a, &'a str, T, extra::Err<Rich<'a, char>>> {}
impl<'a, P, T> YeParser<'a, T> for P where P: Parser<'a, &'a str, T, extra::Err<Rich<'a, char>>> {}
pub fn stringg<'a>() -> impl YeParser<'a, Stringg> {
let escape = just('\\')
.ignore_then(choice((
...
just('\\'),
just('n').to('\n'),
...
)))
.map(|c| Escape(c.to_string()));
let inner = none_of("\\\'")
.repeated()
.at_least(1)
.collect::<String>();
let content = inner.or(escape.map(|e| e.0));
content
.repeated()
.at_least(1)
.collect::<Vec<String>>()
.map(|s| s.join(""))
.map(Stringg)
.delimited_by(just('\''), just('\''))
} Appreciate any guidance! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Chumsky is not, in general, set up to modify an input's memory during parsing and, in fact, the library relies on being able to backtrack to observe input multiple times. Turning Going back to chumsky, one pattern I've seen is to have two parsers: one for strings with no escape characters - which does not allocate - and another for strings with escape characters, which does allocate, to perform character replacement. They can be combined together using let unescaped = ... ; // outputs `&str`
let escaped = ... ; // outputs `String`
// First, try parsing with the unescaped version to skip allocating if we can
let string = unescaped.map(Cow::Borrowed)
// if this doesn't work, fall back to allocating
.or(escaped.map(Cow::Owned)); All this aside, I'd recommend that you benchmark. People often fret about the 'cost of allocation', but more often than not allocators get a bad reputation because their cost is often mixed up with other features of high-level languages like dynamic dispatch, a lack of inlining, cache incoherence, etc. Most of these issues don't apply in Rust, and small allocations in particular are often very quick when using a decent modern allocator. |
Beta Was this translation helpful? Give feedback.
-
Appreciate you taking the time! I’ve settled on having one non-allocated variant and one allocated variant (for interpolated/non-interpolated cases). Instead of using Chumsky has been a pleasure to work with, great work! |
Beta Was this translation helpful? Give feedback.
Chumsky is not, in general, set up to modify an input's memory during parsing and, in fact, the library relies on being able to backtrack to observe input multiple times.
Turning
\
+n
into an ASCII newline character would leave the string with an unoccupied byte. By convention one might replace the unoccupied byte with aDEL
, but this seems like a silly solution and unlikely to be meaningful for most text systems, and only works with ASCII.Going back to chumsky, one pattern I've seen is to have two parsers: one for strings with no escape characters - which does not allocate - and another for strings with escape characters, which does allocate, to perform character replacement. They can …