-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in recursive
when using define
function
#486
Comments
Hi, It seems like |
I was able to reproduce the memory issue with the most current commit on main 6bb31a5. I did so with the following code. use chumsky::prelude::*;
#[derive(Debug, PartialEq)]
enum Chain {
End,
Link(char, Box<Chain>),
}
fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
let mut chain = Recursive::declare();
chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End)));
chain
}
fn main() {
for _n in 1..100_000_000 {
parser();
}
} I actually based this example on an example in the recursive documentation here |
So I installed miri to get more info about the memory leak. Both of the following were ran on main 6bb31a5 Here is the example code I ran with miri. use chumsky::prelude::*;
#[derive(Debug, PartialEq)]
enum Chain {
End,
Link(char, Box<Chain>),
}
fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
let mut chain = Recursive::declare();
chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End)));
chain
}
fn main() {
parser();
} Here is the output from miri when running this example.
If I define the parser with the use chumsky::prelude::*;
#[derive(Debug, PartialEq)]
enum Chain {
End,
Link(char, Box<Chain>),
}
fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
recursive(|chain|
just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End))
)
}
fn main() {
parser();
} |
I feel like I encountered this issue before and determined it would be pretty hard to fix, since clones of a let expr = Recursive::declare();
// Current Behavior: Strong clone - and since you now have an `Rc` containing itself, it leaks
let a = foo(expr.clone());
expr.define(a);
// If the clone was weak instead: Haha, whoops, you drop the original `expr` here and since the clone is weak it panics when you use the parser
expr.clone() |
This feels like it's effectively equivalent to full-on garbage collection in the general case, which is not really possible to solve in Rust while hiding it at an API level. |
Honestly, the easiest answer may be a |
Give me a couple hours (:sweat_smile:), but I think I have a solution for this! |
Alright, so this is meant to be a rough-draft of a solution, but I feel like it helps to solve the original problem, and prevents the user from accidentally calling the wrong clone ( |
Hey! I've also encountered a memory leak with jaq because of this. I'm wondering if we can solve this use case by restricting and extending Chomsky's interface for recursive functions? Within jaq there is some code which creates two mutually recursive parsers. The code is roughly fn filter() -> impl Parser<Token, Spanned<Filter>, Error = Simple<Token>> + Clone {
let parser1 = Recursive::declare();
let parser2 = Recursive::declare();
parser1.define(transform1(parser1.clone(), parser2.clone()));
parser2.define(transform2(parser1.clone(), parser2.clone()));
// At this point parser1 has incremented the strong reference count of parser2, and parser1
// has incremented the strong reference count of parser2, so none of them will ever be dropped.
return parser1
} The issue is that This suggests that Let's add a guarding struct #[allow(missing_docs)]
pub fn recursive_bind<
'a,
I: Clone,
O1,
P1: Parser<I, O1, Error=E> + 'a,
P2,
R: Into<EitherParser<P1, P2>>,
F: FnOnce(Recursive<'a, I, O1, E>) -> R,
E: Error<I>,
>(
f: F,
) -> EitherParser<Recursive<'a, I, O1, E>, P2> {
let mut left_parser = Recursive::declare();
let either = f(
Recursive(match &left_parser.0 {
RecursiveInner::Owned(x) => RecursiveInner::Unowned(Rc::downgrade(x)),
RecursiveInner::Unowned(_) => unreachable!(),
})
).into();
left_parser.define(either.0);
EitherParser(left_parser, either.1)
}
#[allow(missing_docs)]
#[derive(Clone)]
pub struct EitherParser<P1, P2>(P1, P2);
impl <P1, P2> From<(P1, P2)> for EitherParser<P1, P2> {
fn from(value: (P1, P2)) -> Self {
EitherParser(value.0, value.1)
}
}
impl <I, O, P1, P2> Parser<I, O> for EitherParser<P1, P2>
where
I: Clone,
P1: Parser<I, O>,
{
type Error = P1::Error;
fn parse_inner<D: Debugger>(&self, debugger: &mut D, stream: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error>
where
Self: Sized
{
self.0.parse_inner(debugger, stream)
}
fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error> {
self.0.parse_inner_verbose(d, s)
}
fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error> {
self.0.parse_inner_silent(d, s)
}
}
impl <P1, P2> EitherParser<P1, P2> {
fn flip(self) -> EitherParser<P2, P1> {
EitherParser(self.1, self.0)
}
fn left(&self) -> &P1 {
&self.0
}
fn right(&self) -> &P2 {
&self.1
}
} Then we can define pub fn recursive<
'a,
I: Clone,
O,
P: Parser<I, O, Error = E> + 'a,
F: FnOnce(Recursive<'a, I, O, E>) -> P,
E: Error<I>,
>(
f: F,
) -> Recursive<'a, I, O, E> {
recursive_bind(|recursive| (f(recursive), ())).0
}
#[allow(missing_docs)]
pub fn recursive_2<
'a,
I: Clone,
O1,
O2,
P1: Parser<I, O1, Error=E> + 'a,
P2: Parser<I, O2, Error=E> + 'a,
R: Into<EitherParser<P1, P2>>,
F: FnOnce(Recursive<'a, I, O1, E>, Recursive<'a, I, O2, E>) -> R,
E: Error<I>,
>(
f: F,
) -> EitherParser<Recursive<'a, I, O1, E>, Recursive<'a, I, O2, E>> {
recursive_bind(|left| recursive_bind(|right| f(left, right).into().flip()).flip())
}
... We should also make I've tested a modified version of jaq with chomsky 0.9.3 and this function, and the memory leak of jaq was gone ( |
The
recursive
implementation leaks memory if it references itself and the parser definition is defined using thedefine
function. See the following example that creates millions of parsers, but the memory is never released.I originally found this issue in the jaq filter parser here.
The text was updated successfully, but these errors were encountered: