How to prevent Spark Executors from getting Lost when using YARN client mode?

I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them. The solution if you’re using yarn was to set –conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses mesos you can try –conf spark.mesos.executor.memoryOverhead=600 instead. In spark 2.3.1+ the configuration option is now –conf spark.yarn.executor.memoryOverhead=600 It … Read more

Where are logs in Spark on YARN?

You can access logs through the command yarn logs -applicationId <application ID> [OPTIONS] general options are: appOwner <Application Owner> – AppOwner (assumed to be current user if not specified) containerId <Container ID> – ContainerId (must be specified if node address is specified) nodeAddress <Node Address> – NodeAddress in the format nodename:port (must be specified if … Read more

Spark yarn cluster vs client – how to choose which one to use?

A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within the spark-submit process which acts as a client to the … Read more

Hadoop truncated/inconsistent counter name

There’s nothing in Hadoop code which truncates counter names after its initialization. So, as you’ve already pointed out, mapreduce.job.counters.counter.name.max controls counter’s name max length (with 64 symbols as default value). This limit is applied during calls to AbstractCounterGroup.addCounter/findCounter. Respective source code is the following: @Override public synchronized T addCounter(String counterName, String displayName, long value) { … Read more

Which cluster type should I choose for Spark?

Spark Standalone Manager : A simple cluster manager included with Spark that makes it easy to set up a cluster. By default, each application uses all the available nodes in the cluster. A few benefits of YARN over Standalone & Mesos: YARN allows you to dynamically share and centrally configure the same pool of cluster … Read more

Apache Spark: The number of cores vs. the number of executors

To hopefully make all of this a little more concrete, here’s a worked example of configuring a Spark app to use as much of the cluster as possible: Imagine a cluster with six nodes running NodeManagers, each equipped with 16 cores and 64GB of memory. The NodeManager capacities, yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, should probably be set … Read more