How to fix symbol lookup error: undefined symbol errors in a cluster environment

After two dozens of comments to understand the situation, it was found that the libhdf5.so.7 was actually a symlink (with several levels of indirection) to a file that was not shared between the queued processes and the interactive processes. This means even though the symlink itself lies on a shared filesystem, the contents of the … Read more

Easy way to use parallel options of scikit-learn functions on HPC

SKLearn manages its parallelism with Joblib. Joblib can swap out the multiprocessing backend for other distributed systems like dask.distributed or IPython Parallel. See this issue on the sklearn github page for details. Example using Joblib with Dask.distributed Code taken from the issue page linked above. from sklearn.externals.joblib import parallel_backend search = RandomizedSearchCV(model, param_space, cv=10, n_iter=1000, … Read more

How to set amount of Spark executors?

In Spark 2.0+ version use spark session variable to set number of executors dynamically (from within program) spark.conf.set(“spark.executor.instances”, 4) spark.conf.set(“spark.executor.cores”, 4) In above case maximum 16 tasks will be executed at any given time. other option is dynamic allocation of executors as below – spark.conf.set(“spark.dynamicAllocation.enabled”, “true”) spark.conf.set(“spark.executor.cores”, 4) spark.conf.set(“spark.dynamicAllocation.minExecutors”,”1″) spark.conf.set(“spark.dynamicAllocation.maxExecutors”,”5″) This was you can let … Read more

What are the differences between a node, a cluster and a datacenter in a cassandra nosql database?

The hierarchy of elements in Cassandra is: Cluster Data center(s) Rack(s) Server(s) Node (more accurately, a vnode) A Cluster is a collection of Data Centers. A Data Center is a collection of Racks. A Rack is a collection of Servers. A Server contains 256 virtual nodes (or vnodes) by default. A vnode is the data … Read more

Difference between Clustering and Load balancing? [closed]

From Software journal blog an extract. Clustering has a formal meaning. A cluster is a group of resources that are trying to achieve a common objective, and are aware of one another. Clustering usually involves setting up the resources (servers usually) to exchange details on a particular channel (port) and keep exchanging their states, so … Read more

Spread vs MPI vs zeromq?

MPI was deisgned tightly-coupled compute clusters with fast, reliable networks. Spread and ØMQ are designed for large distributed systems. If you’re designing a parallel scientific application, go with MPI, but if you are designing a persistent distributed system that needs to be resilient to faults and network instability, use one of the others. MPI has … Read more