You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
but you don't do self.executor.shutdown() so the threads remain hanging.
In our case, the worker using lithops is a daemon. This means that for each new dataset for AWS Lambda, we create 64 new threads because we are reusing an existing Serverless Executor. This leads to hundreds of hanging threads after a couple of hours .
FaaSInvoker and _start_async_invokers
This is much less of a problem, but also creates two threads each time
Yes, there is a stop() method below but it automatically works only for class FaaSRemoteInvoker. In case I call executor = lithops.ServerlessExecutor(config=lithops_config, ...) then the stop() method only works in the case of a context manager:
After extensive testing of Lithops, I’ve identified the following findings:
The self.executor = ThreadPoolExecutor(invoke_pool_threads) is initialized once, creating a fixed-size pool of reusable threads. These threads persist throughout the execution, and do not grow at any time.
The self.invoker.stop() method is used to stop the async invokers. However, as you pointed out, it is only invoked when using the context manager. Despite this, the thread pool created in the async invokers remains fixed in size as in the previos case.
After several experiments running map() operations for hours, I verified that the number of threads do not grow without bounds. However, I found that it can grow to more than 500 due to this hardcoded line:
For the tests I used the py-spy tool, for example with: py-spy top --pid $(pgrep -f "python3 examples/map.py")
To improve this, I created this PR (#1414) with fixes that will substantially reduce the number of threads, so that the number of threads in the async invokers will depend on your configuration.
I also verified that in the rest of the components of the code where we create threads, they are properly closed.
Hi @JosepSampe,
I think I found two memory leaks in lithops:
FaaSInvoker and self.executor
You create a ThreadPoolExecutor
lithops/lithops/invokers.py
Line 305 in c4ecc74
and use it here
lithops/lithops/invokers.py
Line 456 in c4ecc74
but you don't do
self.executor.shutdown()
so the threads remain hanging.In our case, the worker using lithops is a daemon. This means that for each new dataset for AWS Lambda, we create 64 new threads because we are reusing an existing
Serverless Executor
. This leads to hundreds of hanging threads after a couple of hours .FaaSInvoker and _start_async_invokers
This is much less of a problem, but also creates two threads each time
lithops/lithops/invokers.py
Lines 332 to 336 in c4ecc74
Yes, there is a
stop()
method below but it automatically works only forclass FaaSRemoteInvoker
. In case I callexecutor = lithops.ServerlessExecutor(config=lithops_config, ...)
then thestop()
method only works in the case of a context manager:lithops/lithops/executors.py
Lines 159 to 163 in fdb6432
I'm not insisting that there is a memory leak here, but even in the documentation there are many examples where
FunctionExecutor
is used withoutwith
.P.S. Now the main developer of METASPACE is @lmacielvieira, so I'm tagging him as well @Bisho2122
The text was updated successfully, but these errors were encountered: