When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

I agree to the point that programming with streams is nice and easier for some scenarios but when we’re losing out on performance, why do we need to use them? Performance is rarely an issue. It would be usual for 10% of your streams would need to be rewritten as loops to get the performance …

Read more

Why is a conditional move not vulnerable to Branch Prediction Failure?

Mis-predicted branches are expensive A modern processor generally executes between one and three instructions each cycle if things go well (if it does not stall waiting for data dependencies for these instructions to arrive from previous instructions or from memory). The statement above holds surprisingly well for tight loops, but this shouldn’t blind you to …

Read more

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

Several of the answers in the question you link talk about rewriting the code to be branchless and thus avoiding any branch prediction issues. That’s what your updated compiler is doing. Specifically, clang++ 10 with -O3 vectorizes the inner loop. See the code on godbolt, lines 36-67 of the assembly. The code is a little …

Read more