Iterator versus Stream of Java 8

There’s lots of performance advice here, but sadly much of it is guesswork, and little of it points to the real performance considerations.

@Holger gets it right by pointing out that we should resist the seemingly overwhelming tendency to let the performance tail wag the API design dog.

While there are a zillion considerations that can make a stream slower than, the same as, or faster than some other form of traversal in any given case, there are some factors that point to streams haven a performance advantage where it counts — on big data sets.

There is some additional fixed startup overhead of creating a Stream compared to creating an Iterator — a few more objects before you start calculating. If your data set is large, it doesn’t matter; it’s a small startup cost amortized over a lot of computation. (And if your data set is small, it probably also doesn’t matter — because if your program is operating on small data sets, performance is generally not your #1 concern either.) Where this does matter is when going parallel; any time spent setting up the pipeline goes into the serial fraction of Amdahl’s law; if you look at the implementation, we work hard to keep the object count down during stream setup, but I’d be happy to find ways to reduce it as that has a direct effect on the breakeven data set size where parallel starts to win over sequential.

But, more important than the fixed startup cost is the per-element access cost. Here, streams actually win — and often win big — which some may find surprising. (In our performance tests, we routinely see stream pipelines which can outperform their for-loop over Collection counterparts.) And, there’s a simple explanation for this: Spliterator has fundamentally lower per-element access costs than Iterator, even sequentially. There are several reasons for this.

  1. The Iterator protocol is fundamentally less efficient. It requires calling two methods to get each element. Further, because Iterators must be robust to things like calling next() without hasNext(), or hasNext() multiple times without next(), both of these methods generally have to do some defensive coding (and generally more statefulness and branching), which adds to inefficiency. On the other hand, even the slow way to traverse a spliterator (tryAdvance) doesn’t have this burden. (It’s even worse for concurrent data structures, because the next/hasNext duality is fundamentally racy, and Iterator implementations have to do more work to defend against concurrent modifications than do Spliterator implementations.)

  2. Spliterator further offers a “fast-path” iteration — forEachRemaining — which can be used most of the time (reduction, forEach), further reducing the overhead of the iteration code that mediates access to the data structure internals. This also tends to inline very well, which in turn increases the effectiveness of other optimizations such as code motion, bounds check elimination, etc.

  3. Further, traversal via Spliterator tend to have many fewer heap writes than with Iterator. With Iterator, every element causes one or more heap writes (unless the Iterator can be scalarized via escape analysis and its fields hoisted into registers.) Among other issues, this causes GC card mark activity, leading to cache line contention for the card marks. On the other hand, Spliterators tend to have less state, and industrial-strength forEachRemaining implementations tend to defer writing anything to the heap until the end of the traversal, instead storing its iteration state in locals which naturally map to registers, resulting in reduced memory bus activity.

Summary: don’t worry, be happy. Spliterator is a better Iterator, even without parallelism. (They’re also generally just easier to write and harder to get wrong.)

Leave a Comment