How to restrict actor messages to specific types?

Then you’d have to encode the message type into the Actor ref, which would drastically decrease the value of something like the ActorRegistry. Also, with powerful mechanics like “become” (which is fundamental to the actor model) typing the messages is less valuable. Since Akka doesn’t leak memory when a message is not matched to the … Read more

Spray, Akka-http and Play, Which is the best bet for a new HTTP/REST project

Spray is production ready, but the development team (Mathias Doenitz) works for Typesafe on Akka-http now. The status of Akka-http is “development preview”. There are vague promises of a full release “within a few months”, but nothing you can take to the bank. Edited 29-July-2015: The status of Akka-HTTP is now “release candidate” with version … Read more

Why does Spark fail with “Detected cartesian product for INNER join between logical plans”?

You can triggers inner join after turning on the flag spark.conf.set(“spark.sql.crossJoin.enabled”, “true”) You also could also use the cross join. weights.crossJoin(input) or set the Alias as weights.join(input, input(“sourceId”)===weights(“sourceId”), “cross”) You can find more about the issue SPARK-6459 which is said to be fixed in 2.1.1 As you have already used 2.1.1 the issue should have … Read more

How to load 100 million records into MongoDB with Scala for performance testing?

Some tips : Do not index your collection before inserting, as inserts modify the index which is an overhead. Insert everything, then create index . instead of “save” , use mongoDB “batchinsert” which can insert many records in 1 operation. So have around 5000 documents inserted per batch. You will see remarkable performance gain . … Read more

scala median implementation

Immutable Algorithm The first algorithm indicated by Taylor Leese is quadratic, but has linear average. That, however, depends on the pivot selection. So I’m providing here a version which has a pluggable pivot selection, and both the random pivot and the median of medians pivot (which guarantees linear time). import scala.annotation.tailrec @tailrec def findKMedian(arr: Array[Double], … Read more

scala parallel collections degree of parallelism

With the newest trunk, using the JVM 1.6 or newer, use the: collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(parlevel: Int) This may be a subject to changes in the future, though. A more unified approach to configuring all Scala task parallel APIs is planned for the next releases. Note, however, that while this will determine the number of processors the query … Read more

How to obtain the symmetric difference between two DataFrames?

You can always rewrite it as: df1.unionAll(df2).except(df1.intersect(df2)) Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there … Read more