Why are elementwise additions much faster in separate loops than in a combined loop? by Tarik Answer recommended by Intel