I did a lot of research, and couldn’t find a solution to fix the problem per se. But there is a decent work around that prevents the memory blowout for a small cost, worth especially on server side long running code.
The solution essentially was to restart individual worker processes after a fixed number of tasks. The Pool
class in python takes maxtasksperchild
as an argument. You can specify maxtasksperchild=1000
thus limiting 1000 tasks to be run on each child process. After reaching the maxtasksperchild
number, the pool refreshes its child processes. Using a prudent number for maximum tasks, one can balance the max memory that is consumed, with the start up cost associated with restarting back-end process. The Pool
construction is done as :
pool = mp.Pool(processes=2,maxtasksperchild=1000)
I am putting my full solution here so it can be of use to others!
import multiprocessing as mp
import time
def calculate(num):
l = [num*num for num in range(num)]
s = sum(l)
del l # delete lists as an option
return s
if __name__ == "__main__":
# fix is in the following line #
pool = mp.Pool(processes=2,maxtasksperchild=1000)
time.sleep(5)
print "launching calculation"
num_tasks = 1000
tasks = [pool.apply_async(calculate,(i,)) for i in range(num_tasks)]
for f in tasks:
print f.get(5)
print "calculation finished"
time.sleep(10)
print "closing pool"
pool.close()
print "closed pool"
print "joining pool"
pool.join()
print "joined pool"
time.sleep(5)