These problems usually boil down to the following:
The function you are trying to parallelize doesn’t require enough CPU resources (i.e. CPU time) to rationalize parallelization!
Sure, when you parallelize with multiprocessing.Pool(8)
, you theoretically (but not practically) could get a 8x speed up.
However, keep in mind that this isn’t free – you gain this parallelization at the expense of the following overhead:
- Creating a
task
for everychunk
(of sizechunksize
) in youriter
passed toPool.map(f, iter)
- For each
task
- Serialize the
task
, and thetask's
return value (thinkpickle.dumps()
) - Deserialize the
task
, and thetask's
return value (thinkpickle.loads()
) - Waste significant time waiting for
Locks
on shared memoryQueues
, while worker processes and parent processesget()
andput()
from/to theseQueues
.
- Serialize the
- One-time cost of calls to
os.fork()
for each worker process, which is expensive.
In essence, when using Pool()
you want:
- High CPU resource requirements
- Low data footprint passed to each function call
- Reasonably long
iter
to justify the one-time cost of (3) above.
For a more in-depth exploration, this post and linked talk walk-through how large data being passed to Pool.map()
(and friends) gets you into trouble.
Raymond Hettinger also talks about proper use of Python’s concurrency here.