What is yarn-client mode in Spark?

So in spark you have two different components. There is the driver and the workers. In yarn-cluster mode the driver is running remotely on a data node and the workers are running on separate data nodes. In yarn-client mode the driver is on the machine that started the job and the workers are on the data nodes. In local mode the driver and workers are on the machine that started the job.

When you run .collect() the data from the worker nodes get pulled into the driver. It’s basically where the final bit of processing happens.

For my self i have found yarn-cluster mode to be better when i’m at home on the vpn, but yarn-client mode is better when i’m running code from within the data center.

Yarn-client mode also means you tie up one less worker node for the driver.

Leave a Comment