Possible memory leak #1409

sergii-mamedov · 2024-11-22T15:19:34Z

I think I found two memory leaks in lithops:

FaaSInvoker and self.executor
You create a ThreadPoolExecutor

lithops/lithops/invokers.py

Line 305 in c4ecc74

self.executor = ThreadPoolExecutor(invoke_pool_threads)

and use it here

lithops/lithops/invokers.py

Line 456 in c4ecc74

future = self.executor.submit(self._invoke_task, job, call_ids_range)

but you don't do self.executor.shutdown() so the threads remain hanging.
In our case, the worker using lithops is a daemon. This means that for each new dataset for AWS Lambda, we create 64 new threads because we are reusing an existing Serverless Executor. This leads to hundreds of hanging threads after a couple of hours .

FaaSInvoker and _start_async_invokers
This is much less of a problem, but also creates two threads each time

Lines 332 to 336 in c4ecc74

    
           for inv_id in range(self.ASYNC_INVOKERS): 
        
               p = threading.Thread(target=invoker_process, args=(inv_id,)) 
        
               self.invokers.append(p) 
        
               p.daemon = True 
        
               p.start()

Yes, there is a stop() method below but it automatically works only for class FaaSRemoteInvoker. In case I call executor = lithops.ServerlessExecutor(config=lithops_config, ...) then the stop() method only works in the case of a context manager:

lithops/lithops/executors.py

Lines 159 to 163 in fdb6432

    
           def __exit__(self, exc_type, exc_value, traceback): 
        
               """ Context manager method """ 
        
               self.job_monitor.stop() 
        
               self.invoker.stop() 
        
               self.compute_handler.clear()

I'm not insisting that there is a memory leak here, but even in the documentation there are many examples where FunctionExecutor is used without with.

P.S. Now the main developer of METASPACE is @lmacielvieira, so I'm tagging him as well @Bisho2122

The text was updated successfully, but these errors were encountered:

JosepSampe · 2025-01-11T11:19:47Z

Hi @sergii-mamedov @lmacielvieira

After extensive testing of Lithops, I’ve identified the following findings:

The self.executor = ThreadPoolExecutor(invoke_pool_threads) is initialized once, creating a fixed-size pool of reusable threads. These threads persist throughout the execution, and do not grow at any time.

The self.invoker.stop() method is used to stop the async invokers. However, as you pointed out, it is only invoked when using the context manager. Despite this, the thread pool created in the async invokers remains fixed in size as in the previos case.

After several experiments running map() operations for hours, I verified that the number of threads do not grow without bounds. However, I found that it can grow to more than 500 due to this hardcoded line:

lithops/lithops/invokers.py

Line 318 in c4ecc74

with ThreadPoolExecutor(max_workers=250) as executor:

For the tests I used the py-spy tool, for example with: py-spy top --pid $(pgrep -f "python3 examples/map.py")

To improve this, I created this PR (#1414) with fixes that will substantially reduce the number of threads, so that the number of threads in the async invokers will depend on your configuration.

I also verified that in the rest of the components of the code where we create threads, they are properly closed.

JosepSampe mentioned this issue Jan 11, 2025

[Invoker] Reduce threads in async FaaS Invoker and resolve token bucket issue #1414

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak #1409

Possible memory leak #1409

sergii-mamedov commented Nov 22, 2024

JosepSampe commented Jan 11, 2025 •

edited

Loading

Possible memory leak #1409

Possible memory leak #1409

Comments

sergii-mamedov commented Nov 22, 2024

JosepSampe commented Jan 11, 2025 • edited Loading

JosepSampe commented Jan 11, 2025 •

edited

Loading