Which cluster type should I choose for Spark?

Spark Standalone Manager : A simple cluster manager included with Spark that makes it easy to set up a cluster. By default, each application uses all the available nodes in the cluster.

A few benefits of YARN over Standalone & Mesos:

  1. YARN allows you to dynamically share and centrally configure the same pool of cluster resources between all frameworks that run on YARN.

  2. You can take advantage of all the features of YARN schedulers for categorizing, isolating, and prioritizing workloads.

  3. The Spark standalone mode requires each application to run an executor on every node in the cluster; whereas with YARN, you choose the number of executors to use

  4. YARN directly handles rack and machine locality in your requests, which is convenient.

  5. The resource request model is, oddly, backwards in Mesos. In YARN, you (the framework) request containers with a given specification and give locality preferences. In Mesos you get resource “offers” and choose to accept or reject those based on your own scheduling policy. The Mesos model is a arguably more flexible, but seemingly more work for the person implementing the framework.

  6. If you have a big Hadoop cluster already in place, YARN is better choice.

  7. The Standalone manager requires the user configure each of the nodes with the shared secret. Mesos’ default authentication module, Cyrus SASL, can be replaced with a custom module. YARN has security for authentication, service level authorization, authentication for Web consoles and data confidentiality. Hadoop authentication uses Kerberos to verify that each user and service is authenticated by Kerberos.

  8. High availability is offered by all three cluster managers but Hadoop YARN doesn’t need to run a separate ZooKeeper Failover Controller.

Useful links:

spark documentation page

agildata article

Leave a Comment