Scala: How can I replace value in Dataframes using scala

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD: dataframe.withColumn(“make”, when(col(“make”).equalTo(“Tesla”), “S”) .otherwise(col(“make”) ); Edited to add @marshall245 “otherwise” to ensure non-Tesla columns aren’t converted to NULL.

How to disable package and publish tasks for root aggregate module in multi-module build?

Instead of playing whac-a-mole by listing specific tasks to disable (publish, publish-local, publish-signed, etc), another option is to turn off artifact publishing at the source. publishArtifact := false Even though there’s no publishing happening, I also found I needed to supply a publishTo value to make sbt-pgp’s publish-signed task happy. It needs this value, even … Read more

In Scala; should I use the App trait?

The problem with the Application trait is actually described in its documentation: (1) Threaded code that references the object will block until static initialization is complete. However, because the entire execution of an object extending Application takes place during static initialization, concurrent code will always deadlock if it must synchronize with the enclosing object. This … Read more

How can I update a broadcast variable in spark streaming?

Extending the answer By @Rohan Aletty. Here is a sample code of a BroadcastWrapper that refresh broadcast variable based on some ttl public class BroadcastWrapper { private Broadcast<ReferenceData> broadcastVar; private Date lastUpdatedAt = Calendar.getInstance().getTime(); private static BroadcastWrapper obj = new BroadcastWrapper(); private BroadcastWrapper(){} public static BroadcastWrapper getInstance() { return obj; } public JavaSparkContext getSparkContext(SparkContext sc) … Read more