multiprocessing.Pool() slower than just using ordinary functions

These problems usually boil down to the following:

The function you are trying to parallelize doesn’t require enough CPU resources (i.e. CPU time) to rationalize parallelization!

Sure, when you parallelize with multiprocessing.Pool(8), you theoretically (but not practically) could get a 8x speed up.

However, keep in mind that this isn’t free – you gain this parallelization at the expense of the following overhead:

  1. Creating a task for every chunk (of size chunksize) in your iter passed to Pool.map(f, iter)
  2. For each task
    1. Serialize the task, and the task's return value (think pickle.dumps())
    2. Deserialize the task, and the task's return value (think pickle.loads())
    3. Waste significant time waiting for Locks on shared memory Queues, while worker processes and parent processes get() and put() from/to these Queues.
  3. One-time cost of calls to os.fork() for each worker process, which is expensive.

In essence, when using Pool() you want:

  1. High CPU resource requirements
  2. Low data footprint passed to each function call
  3. Reasonably long iter to justify the one-time cost of (3) above.

For a more in-depth exploration, this post and linked talk walk-through how large data being passed to Pool.map() (and friends) gets you into trouble.

Raymond Hettinger also talks about proper use of Python’s concurrency here.

Leave a Comment