Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty parsers report wrong span #707

Open
smessmer opened this issue Dec 5, 2024 · 4 comments
Open

Empty parsers report wrong span #707

smessmer opened this issue Dec 5, 2024 · 4 comments

Comments

@smessmer
Copy link

smessmer commented Dec 5, 2024

Possibly related: #704

Chumsky 0.9.2 (did not find a 0.9.2 tag in the repository, so I built myself from revision e656589)

Example:

let parser = chumsky::primitive::empty::<Simple<char>>().map_with_span(|(), span| span);
let span = parser.parse("a");
assert_eq!(span, Ok(0..0));

I expect this to be the empty span since nothing was parsed, but it is actually 0..1:

assertion `left == right` failed
  left: Ok(0..1)
 right: Ok(0..0)

The same applies for other parsers that aren't supposed to consume any input, e.g. rewind:

let parser = chumsky::primitive::just::<char, _, Simple<char>>('a')
    .rewind()
    .map_with_span(|_, span| span);
let span = parser.parse("a");
assert_eq!(span, Ok(0..0));
assertion `left == right` failed
  left: Ok(0..1)
 right: Ok(0..0)
@smessmer smessmer changed the title Empty parsers use wrong span Empty parsers report wrong span Dec 5, 2024
@smessmer
Copy link
Author

smessmer commented Dec 6, 2024

This blocks things like writing a chumsky-to-nom adapter. I wanted to write such an adapter to gradually convert a project from nom to chumsky, but because some parsers report incorrect spans, this causes he wrong input length to be consumed by nom;

pub fn chumsky_to_nom<T>(
    parser: impl chumsky::Parser<char, T, Error = Simple<char>>,
) -> impl Fn(&str) -> IResult<&str, T, VerboseError<&str>> {
    let parser = parser.map_with_span(|parsed, span| (parsed, span));
    move |input| {
        let (parsed, span) = parser.parse(input).map_err(|err| {
            eprintln!("Failed to parse chumsky: {:?}", err);
            nom::Err::Error(VerboseError {
                errors: vec![(input, VerboseErrorKind::Context("chumsky"))],
            })
        })?;
        let forward_by_num_bytes = input.chars().take(span.end).map(|c| c.len_utf8()).sum();
        Ok((&input[forward_by_num_bytes..], parsed))
    }
}

@smessmer
Copy link
Author

smessmer commented Dec 7, 2024

repeated with zero occurrences is also affected;

let parser = chumsky::primitive::just::<char, _, Simple<char>>('a')
    .repeated()
    .map_with_span(|_, span| span);
let span = parser.parse("b");
assert_eq!(span, Ok(0..0));
assertion `left == right` failed
  left: Ok(0..1)
 right: Ok(0..0)

@benjamin-cates
Copy link
Contributor

benjamin-cates commented Jan 26, 2025

Not a solution or anything, but just adding context.

It seems like this bug was fixed between 0.9.3 and 1.0.0-alpha.8.

Here is some test code showing that the empty parser, a rewinded just, and a zero item repeated all produce the right input span with length zero:

// Tested on main branch as of January 26, 2025
#[test]
fn test_issue_707<'src>() {
    use crate::prelude::*;
    let parsers = [
        empty::<&str, extra::Default>().boxed(),
        just::<_, &str, extra::Default>('h').to(()).rewind().boxed(),
        just::<_, &str, extra::Default>('a').repeated().boxed(),
    ];
    for parser in parsers {
        let assert_span_is_0_0 = |parser: Boxed<'src, '_, &'src str, SimpleSpan, extra::Default>| {
            assert_eq!(parser.lazy().parse("hi").unwrap(), SimpleSpan::new(0,0));
        };
        assert_span_is_0_0(parser.clone().to_span().boxed());
        assert_span_is_0_0(parser.clone().map_with(|_, e| e.span()).boxed());
        assert_span_is_0_0(parser.clone().try_map(|_, span| Ok(span)).boxed());
    }
}

Unfortunately, there hasn't been a stable release in over a year, and I tested this on the current stable version on crates.io (0.9.3) and it is still broken :(

@zesterer
Copy link
Owner

zesterer commented Jan 26, 2025

0.9 is considered legacy at this stage: I highly recommend using 1.0, if you can.

That said, if you were personally invested in fixing this I think a solution would mean changing this code to look more like this:

let end = if self.offset == 0 {
    self.eoi.start()
} else {
    self
        .pull_until(self.offset.saturating_sub(1).max(start_offset))
        .as_ref()
        .map(|(_, s)| s.end())
        .unwrap_or_else(|| self.eoi.end())
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants