Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in recursive when using define function #486

Open
mattnenterprise opened this issue Jul 24, 2023 · 9 comments · May be fixed by #664
Open

Memory leak in recursive when using define function #486

mattnenterprise opened this issue Jul 24, 2023 · 9 comments · May be fixed by #664

Comments

@mattnenterprise
Copy link

mattnenterprise commented Jul 24, 2023

The recursive implementation leaks memory if it references itself and the parser definition is defined using the define function. See the following example that creates millions of parsers, but the memory is never released.

use chumsky::prelude::*;

#[derive(Debug, PartialEq)]
enum Chain {
    End,
    Link(char, Box<Chain>),
}

fn parser() -> impl Parser<char, Chain, Error = Simple<char>> {
    let mut chain = Recursive::<_, _, Simple<char>>::declare();

    chain.define(just('+')
        .then(chain.clone())
        .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
        .or_not()
        .map(|chain| chain.unwrap_or(Chain::End)));

    chain
}

fn main() {
    for _n in 1..100_000_000 {
        parser();
    }
}

I originally found this issue in the jaq filter parser here.

@zesterer
Copy link
Owner

zesterer commented Jul 26, 2023

Hi,

It seems like jaq is using 0.9. The 1.0 release is a from-scratch rewrite of chumsky. Do you experience the same issue with 1.0?

@mattnenterprise
Copy link
Author

jaq generates lots of compile errors when I try to upgrade from 0.9 to 1.0, and I'm not knowledgeable enough about the code base to upgrade them all.

I was able to reproduce the memory issue with the most current commit on main 6bb31a5. I did so with the following code.

use chumsky::prelude::*;

#[derive(Debug, PartialEq)]
enum Chain {
    End,
    Link(char, Box<Chain>),
}

fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
    let mut chain = Recursive::declare();

    chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
        .then(chain.clone())
        .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
        .or_not()
        .map(|chain| chain.unwrap_or(Chain::End)));

    chain
}

fn main() {
    for _n in 1..100_000_000 {
        parser();
    }
}

I actually based this example on an example in the recursive documentation here

@mattnenterprise
Copy link
Author

mattnenterprise commented Jul 26, 2023

So I installed miri to get more info about the memory leak. Both of the following were ran on main 6bb31a5

Here is the example code I ran with miri.

use chumsky::prelude::*;

#[derive(Debug, PartialEq)]
enum Chain {
    End,
    Link(char, Box<Chain>),
}

fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
    let mut chain = Recursive::declare();

    chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
        .then(chain.clone())
        .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
        .or_not()
        .map(|chain| chain.unwrap_or(Chain::End)));

    chain
}

fn main() {
    parser();
}

Here is the output from miri when running this example.

error: memory leaked: alloc819 (Rust heap, size: 32, align: 8), allocated here:
   --> /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9
    |
98  |         __rust_alloc(layout.size(), layout.align())
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: inside `std::alloc::alloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9: 98:52
    = note: inside `std::alloc::Global::alloc_impl` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181:73: 181:86
    = note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241:9: 241:39
    = note: inside `alloc::alloc::exchange_malloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:330:11: 330:34
    = note: inside `std::boxed::Box::<std::rc::RcBox<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>>::new` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:217:9: 217:20
    = note: inside `std::rc::Rc::<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>::new` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/rc.rs:397:27: 397:94
note: inside `chumsky::recursive::Recursive::<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>::declare`
   --> /home/mmccoy/code/chumsky/src/recursive.rs:128:42
    |
128 |               inner: RecursiveInner::Owned(RefC::new(Indirect {
    |  __________________________________________^
129 | |                 inner: OnceCell::new(),
130 | |             })),
    | |______________^
note: inside `parser::<'_>`
   --> examples/memory_leak.rs:10:21
    |
10  |     let mut chain = Recursive::declare();
    |                     ^^^^^^^^^^^^^^^^^^^^
note: inside `main`
   --> examples/memory_leak.rs:22:5
    |
22  |     parser();
    |     ^^^^^^^^

error: memory leaked: alloc881 (Rust heap, size: 24, align: 8), allocated here:
  --> /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9
   |
98 |         __rust_alloc(layout.size(), layout.align())
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: inside `std::alloc::alloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9: 98:52
   = note: inside `std::alloc::Global::alloc_impl` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181:73: 181:86
   = note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241:9: 241:39
   = note: inside `alloc::alloc::exchange_malloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:330:11: 330:34
   = note: inside `std::boxed::Box::<chumsky::combinator::Map<chumsky::combinator::OrNot<chumsky::combinator::Map<chumsky::combinator::Then<chumsky::primitive::Just<char, &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, chumsky::recursive::Recursive<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>, char, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, (char, Chain), [closure@examples/memory_leak.rs:14:14: 14:26]>>, std::option::Option<Chain>, [closure@examples/memory_leak.rs:16:14: 16:21]>>::new` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:217:9: 217:20
note: inside `parser::<'_>`
  --> examples/memory_leak.rs:12:5
   |
12 | /     chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
13 | |         .then(chain.clone())
14 | |         .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
15 | |         .or_not()
16 | |         .map(|chain| chain.unwrap_or(Chain::End)));
   | |__________________________________________________^
note: inside `main`
  --> examples/memory_leak.rs:22:5
   |
22 |     parser();
   |     ^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

note: the evaluated program leaked memory, pass `-Zmiri-ignore-leaks` to disable this check

error: aborting due to 2 previous errors

If I define the parser with the recursive function like so there is no memory leak reported by miri.

use chumsky::prelude::*;

#[derive(Debug, PartialEq)]
enum Chain {
    End,
    Link(char, Box<Chain>),
}

fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
    recursive(|chain|
        just::<_, _, extra::Err<Simple<char>>>('+')
        .then(chain.clone())
        .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
        .or_not()
        .map(|chain| chain.unwrap_or(Chain::End))
    )
}

fn main() {
    parser();
}

@CraftSpider
Copy link
Collaborator

CraftSpider commented Jul 26, 2023

I feel like I encountered this issue before and determined it would be pretty hard to fix, since clones of a Recursive::declare, which is strong, are themselves strong, and changing this could make it easy to accidentally drop your recursive parser:

let expr = Recursive::declare();

// Current Behavior: Strong clone - and since you now have an `Rc` containing itself, it leaks
let a = foo(expr.clone());

expr.define(a);
// If the clone was weak instead: Haha, whoops, you drop the original `expr` here and since the clone is weak it panics when you use the parser
expr.clone()

@zesterer
Copy link
Owner

This feels like it's effectively equivalent to full-on garbage collection in the general case, which is not really possible to solve in Rust while hiding it at an API level.

@CraftSpider
Copy link
Collaborator

Honestly, the easiest answer may be a weak_clone method that gets you a version of the parser to use inside itself.

@Zij-IT
Copy link
Contributor

Zij-IT commented Jul 28, 2023

Give me a couple hours (:sweat_smile:), but I think I have a solution for this!

@Zij-IT
Copy link
Contributor

Zij-IT commented Jul 28, 2023

Alright, so this is meant to be a rough-draft of a solution, but I feel like it helps to solve the original problem, and prevents the user from accidentally calling the wrong clone (weak_clone), which I feel would make it too easy to accidentally (and silently) create a memory leak.

@jkylling
Copy link

Hey! I've also encountered a memory leak with jaq because of this. I'm wondering if we can solve this use case by restricting and extending Chomsky's interface for recursive functions?

Within jaq there is some code which creates two mutually recursive parsers. The code is roughly

fn filter() -> impl Parser<Token, Spanned<Filter>, Error = Simple<Token>> + Clone {
  let parser1 = Recursive::declare();
  let parser2 = Recursive::declare();

  parser1.define(transform1(parser1.clone(), parser2.clone()));
  parser2.define(transform2(parser1.clone(), parser2.clone()));

  // At this point parser1 has incremented the strong reference count of parser2, and parser1
  // has incremented the strong reference count of parser2, so none of them will ever be dropped.
  return parser1
}

The issue is that parser1.clone() and parser2.clone() increment the strong counts of the OnceCells of the parsers. So if we change the code to use weak references when cloning instead we should not have any trouble. But not so quick. Let's say parser2.clone() in the above instead incremented the weak count of the OnceCell of parser2. Then when we get to the end of the filter function we drop parser2, since there is only one strong reference to it. When we then try to use parser1 we get a panic, as the weak reference to parser2 is no longer valid.

This suggests that parser1 and parser2 can only be dropped together. Maybe we can change the API to enforce this?

Let's add a guarding struct EitherParser which ensures that the parsers are always dropped together, and let's add recursive_bind of the following form:

#[allow(missing_docs)]
pub fn recursive_bind<
    'a,
    I: Clone,
    O1,
    P1: Parser<I, O1, Error=E> + 'a,
    P2,
    R: Into<EitherParser<P1, P2>>,
    F: FnOnce(Recursive<'a, I, O1, E>) -> R,
    E: Error<I>,
>(
    f: F,
) -> EitherParser<Recursive<'a, I, O1, E>, P2> {
    let mut left_parser = Recursive::declare();
    let either = f(
        Recursive(match &left_parser.0 {
            RecursiveInner::Owned(x) => RecursiveInner::Unowned(Rc::downgrade(x)),
            RecursiveInner::Unowned(_) => unreachable!(),
        })
    ).into();
    left_parser.define(either.0);
    EitherParser(left_parser, either.1)
}

#[allow(missing_docs)]
#[derive(Clone)]
pub struct EitherParser<P1, P2>(P1, P2);

impl <P1, P2> From<(P1, P2)> for EitherParser<P1, P2> {
    fn from(value: (P1, P2)) -> Self {
        EitherParser(value.0, value.1)
    }
}

impl <I, O, P1, P2> Parser<I, O> for  EitherParser<P1, P2>
where
    I: Clone,
    P1: Parser<I, O>,
{
    type Error = P1::Error;

    fn parse_inner<D: Debugger>(&self, debugger: &mut D, stream: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error>
    where
        Self: Sized
    {
        self.0.parse_inner(debugger, stream)
    }

    fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error> {
        self.0.parse_inner_verbose(d, s)
    }

    fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error> {
        self.0.parse_inner_silent(d, s)
    }
}

impl <P1, P2> EitherParser<P1, P2> {
    fn flip(self) -> EitherParser<P2, P1> {
        EitherParser(self.1, self.0)
    }

    fn left(&self) -> &P1 {
        &self.0
    }

    fn right(&self) -> &P2 {
        &self.1
    }
}

Then we can define recursive and recursive_2 as

pub fn recursive<
    'a,
    I: Clone,
    O,
    P: Parser<I, O, Error = E> + 'a,
    F: FnOnce(Recursive<'a, I, O, E>) -> P,
    E: Error<I>,
>(
    f: F,
) -> Recursive<'a, I, O, E> {
    recursive_bind(|recursive| (f(recursive), ())).0
}

#[allow(missing_docs)]
pub fn recursive_2<
    'a,
    I: Clone,
    O1,
    O2,
    P1: Parser<I, O1, Error=E> + 'a,
    P2: Parser<I, O2, Error=E> + 'a,
    R: Into<EitherParser<P1, P2>>,
    F: FnOnce(Recursive<'a, I, O1, E>, Recursive<'a, I, O2, E>) -> R,
    E: Error<I>,
>(
    f: F,
) -> EitherParser<Recursive<'a, I, O1, E>, Recursive<'a, I, O2, E>> {
    recursive_bind(|left| recursive_bind(|right| f(left, right).into().flip()).flip())
}
...

We should also make Recursive::declare non-public, to prevent accidents with the API.

I've tested a modified version of jaq with chomsky 0.9.3 and this function, and the memory leak of jaq was gone (loop { jaq_parse::defs() } was sufficient to reproduce).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants